CORRECTED 
* VERSION* 



PCT 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 5 ; 
C12Q 1/68, A01H 1/04 



Al 



(11) International Publication Number: 
(43) Internationa] Publication Date: 



WO 90/04651 

3 May 1990 (03.05.90) 



(21) International Application Number: PCT/US89/04688 

(22) International Filing Date: 19 October 1989 (19.10.89) 



(30) Priority data: 
259,998 



19 October 1988 (19.10.88) US 



(71) Applicants: WHITEHEAD INSTITUTE FOR BIOMEDI- 

CAL RESEARCH [US/US]; Nine Cambridge Center, 
Cambridge, MA 02142 (US). CORNELL RESEARCH 
FOUNDATION, INC. [US/US]; East Hill Plaza, Itha- 
ca, NY 14850 (US). 

(72) Inventors: LANDER, Eric, S; ; 151 Bishop Richard Allen 

Drive, Cambridge, MA 02139 (US). PATERSON, An- 
drew, H. ; 1026 Ellis Hollow Road, Apt. 32, Ithaca, NY 
14853 (US). TANKSLEY, Steven, D. ; 215 Connecticut 
Hill Road, Newfield, NY 14867 (US). 



(74) Agents: GRANAHAN, Patricia et al.; Hamilton, Brook, 
Smith & Reynolds, Two Militia Drive, Lexington, MA 
02173 (US). 



(81) Designated States: AT (European patent), BE (European 
patent), CH (European patent), DE (European patent), 
FR (European patent), GB (European patent), IT (Euro- 
pean patent), JP, LU (European patent), NL (European 
patent), SE (European patent). 



Published 

With international search report. 
Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: MAPPING QUANTITATIVE TRAITS USING GENETIC MARKERS 



o 
c 

0) 

rr 
o 

u. 



3 5 
3 0 

2 5 

2 0 

1 5 

1 0 

s 



Mean = 75.0% 



oh 



Mid: 
5 8,9 % 



Max = 
9 0.3% 



SO 5 5 6 0 6 5 7 0 7 5 8 0 8 5 9 0 9 5 100 

Percent recurrent parent genome 



(57) Abstract 

A systematic and accurate method for mapping or locating the genomic regions or subregions containing polygenic factors 
which control a quantitatively inherited trait or traits of interest. The present method, which can be applied to higher plants and 
animals, makes it possible to determine with a high degree of accuracy that a quantitative trait loci (QTL) lies within a genomic 
region or subregion bounded by selected genetic markers. 



* ">-'-"*h »« PCT Gazette No. 14 '1990 Section III 



) 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AT 


Austria 


E5 


Spain 


MG 


Madagascar 


AD 


Australia 


FT 


Finland 


ML 


Mali 


B8 


Barbados 


FR 


France 


MR 


Mauritania 


BE 


Belgium 


GA 


Gabon 


MW 


Malawi 


BF 


Burkina Fasso 


GB 


United Kingdom 


NL 


Netherlands 


BG 


Bulgaria 


HU 


Hungary 


NO 


''Norway 


EI 


Benin 


rr 


Italy 


KO 


Romania 


BR 


Brazil 


jp 


Japan 


SO 


Sudan 


CA 


Canada 


KP 


Democratic People's Republic 


SE 


Sweden 


CF 


Central African Republic 




of Korea 




Senegal 


CG 


Congo 


KR 


Republic of Korea 


SU 


Soviet Union 


CH 


Switzerland 


u 


Liechtenstein 


ID 


Chad 


CM 


Cameroon 


IX 


Sri Lanka 


TG 


Togo 


DE 


Germany, Federal Republic of 


. w 


Luxembourg 


US 


United Sates of America 


DK 


Denmark 


MC 


Monaco 







WO 90/04651 



PCT/US89/04688 



MAPPING JJUANTITATIVE^ 

££JL££i££i£H 



05 



10 



15 



20 



25 



Although some important characteristics of 
agricultural crops and animals are determined by genetic 
loci which have major effects on phenotype, most traits 
which are economically valuable are quantitative in 
nature. In the case of a quantitatively inherited trait, 
phenotypic variation in that trait shows continuous 
quantitative differences among individual plants or 
animals and there is a normal distribution of phenotypic 
values for the trait in a given population. The 
continuous variation is seen because of the fact that the 
collective effect of numerous genetic loci, each having a 
small effect (making a small quantitative contribution 
to), determines the phenotypic trait. Traits which 
exhibit such genetic variation are often referred to as 
polygenic traits because genetic variation at a large 
number of genetic loci affects phenotypic expression. 
Loci which contribute to polygenic traits are often 
referred to as "minor genes"; "major genes" are those in 
which variation results in substantial effects on 
phenotype and which exhibit a Mendelian pattern of 
inheritance. Beckmann, J.S. and M. Soller, Oxford 
l££veys_of_Plant_Mol^cular_and_Ce 
3:196-250(1986). 



The conflict between the MendeLian theory of 
particulate inheritance and the observation that most 
traits in nature exhibit continuous variation was 
resolved in the early 1900s by the concept that 
quantitative traits can result from segregation of 
multiple genes, modified by environmental effects. 

Mendel, G . , Verh_. des Nat urf . Ve reines in Brunn, 4 

(1866); Johannsen, W. Elemente_der_exakten 
Erbi lichkeitsllehre (Fischer, Jena, 1909); Nilsson- Ehle , 
H . Kr e uzung unt ersuc hungen an t Haf er und Weizen ( Lund , 
1909); East, E.M., Gen etics , 1:164-176 (1915); Wrigth, S. 
Evolution and the Genetic of Populations (Univ. of 
Chicago, Chicago, 1968). Although it has been shown that 
polygenes exhibit a Mendelian pattern of inheritance, it 
has not been possible to separate' the effect of different 
loci acting in this way. Pioneering experiments have 
shown that linkage to such quantitative trait loci (QTLs) 
could occasionally be detected. However, accurate and 
systematic mapping of QTLs has not been possible because 
the inheritance of an entire genome could not be studied 
with genetic markers. Sax, K. , Genetics , 8^:552-556 
(1923); Thoday, J.M., Nature , 191: 368-369 (1961); 
Tanksley, S.D. et al . , Her edity, 49:11-25 (1982); 
Edwards, M.D. e t al . , Gene tics , 116:113-125 (1987). The 
use of restriction fragment length polymorphisms (RFLPs) 
has made such investigations possible, at least in 
principle. Botstein, D. e t a 1 , Ameri ca n Journ al o f 
Human Genetics, 32:314-331 (1980). 

DNA polymorphisms, which are differences in the 
nucleotide sequence of a region of DNA, are of several 
types. For example, variation in the nucleotide sequence 
of DNA whic'n is the result of a point mutation can, in 
turn, result in gain or loss of a restriction site for a 



WO 90/04651 



) 

PCT/US89/04688 



05 



15 



30 



-3 



particular restriction endonuclease . Changes in DNA 
which involve larger regions (deletions, additions, 
inversions, translocations) change the relative 
distribution of restriction sites for several restriction 
endonucleases . In both cases, endonuclease digestion of 
the region affected by the mutation produces DNA 
restriction fragments which differ in size distribution 
from fragments similarly obtained from an unaffected 
individual (i.e. , one in whom such a mutation or 
10 alteration is not present). The result is an RFLP, 

which, in the case of a point mutation, is generated only 
by enzymes whose recognition sites include the mutation 
and, in the case of changes in larger regions, is 
generated by a number of enzymes. 

RFLPs "have been shown to be 'stably inherited and, in 
the case of nuclear RFLPs, to express codominance and 
generally lack obvious phenotypic effects. In addition, 
RFLPs oft en exhibit multiple alleles. Assigning 
locations of RFLPs on a chromosome map can be carried 
out, using art-recognized techniques. Beckmann, J.S. and 
M . S o 1 1 e r , Ox f o r d_S u r v ey_ s _ o f _P 1 a n t _Mo 1 e c u 1 a r _ a n d_ C e 1 1 u 1 a r 
Ii°i££2. 3:196-250 (1986). 

Manipulation of QTL has historically been a major 
limitation of genetic engineering and classical breeding. 
Systematic and accurate mapping of QTLs has not been 
possible because of the difficulty in arranging crosses 
with genetic markers densely spaced throughout an entire 
genome. RFLP techniques make it possible to try to 
identify and manipulate QTLs, by, for example, 
determining RFLP - QTL linkage, followed by mapping and 
evaluating QTL effects; directly identifying allelic 
variation at a QTL; and using insertional mutagenesis to 
identify and clone QTL. It would be very beneficial if a 



20 



25 



1 

WO 90/04651 - PCI7US89/04688 



-4- 

method by which QTL could be identified, accurately- 
mapped and introduced, as needed, into plants and 
animals, particularly for development of new strains 
which exhibit desirable features. 

0 5 Summary of the In vention 

The present invention relates to a systematic and 
accurate method for mapping or locating precisely the 
genomic regions containing polygenic factors controlling 
a quantitatively inherited trait or traits of interest. 
10 Described herein is the first method by which accurate 

mapping of a QTL to an interval can be carried out. That 
is, unlike previously- available methods of determining 
QTLs , the present method makes it possible to determine 
with a high degree of accuracy that' a QTL lies within'an" 
15 interval (e.g., within a region of DNA bounded by two 

selected markers or a subregion thereof) . The accuracy 
with which mapping can be carried out by the method of 
the present invention is particularly valuable in that it 
provides precision in at least two applications: gene 
20 cloning and gene transfer. For example, in gene cloning, 
the present method makes it possible to delimit the 
region in which it is highly likely that DNA encoding a 
quantitative trait of interest occurs, thus maximizing 
the likelihood that the DNA of interest is, in fact 
25 isolated and minimizing the effort which must be expended 
in its isolation. Similarly, the present method is also 
very valuable in those instances in which transfer of a 
gene or gene portion from one plant or animal (e.g., an 
unagricultural product or wild type) to another plant or 
30 animal (e.g., an agricultural product) is to be carried 
out. As in the case of gene cloning, in gene transfer, 
tight intervals increase the likelihood that the desired 



WO 90/04651 



PCI7US89/04688 



-5- 



05 



10 



gene will be transferred and decrease the effort (and 
concomitant time and expense) necessary to transfer the 
gene or gene portion. 

The method of the present invention makes use of 
genetic markers to map/locate QTL, to estimate or predict 
their phenotypic effects and to greatly reduce the number 
of progeny which must be scored with the DNA markers. In 
the case described herein, RFLPs, isozymes and two 
specific genes were the genetic markers used. However, 
any genetic markers, such as any DNA polymorphisms or DNA 
sequence differences that can be detected, codorainant 
protein polymorphisms or a combination of codominant 
genetic markers can be used for the same purpose. Any 
set of scorable genetic markers (codominant or recessive) 
15 which 'cover most o'f the genome of the plant or'animal 
being assessed can be used for this purpose. 

As a result of following an entire genetic map of 
RFLPs in a cross of two strains which differ for one or 
more quantitatively inherited traits, it has been 
20 possible to develop analytical methods useful for 

systematic, and accurate mapping of these genes and, then, 
by following the inheritance of. nearby markers, for 
breeding them into new strains. The availability of 
complete RFLP linkage maps, such as described herein, has 
25 made is possible to dissect quantitative traits into 
discrete genetic factors. Subsequently, QTLs can be 
mapped and isogenic lines can be constructed, so as to 
differ only in the region of the QTL, by using the RFLPs 
to select for the desired region. and against the 
30 remainder of the genome. Such isogenic lines can be 
used, in conjunction with the fundamental tools of 
genetic and molecular biology for the study of a trait, 
including testing of complementation and epistasis; 



characterization of physiological and biochemical 
differences between isogenic lines; isolation of 
additional alleles via mutagenesis or further selective 
breeding (at least in favorable systems); and, 
eventually, molecular cloning of the genes underlying 
quantitative inheritance. The method of the present 
invention has broad application to breeding of plants and 
animals for agriculturally valuable traits , particularly 
because it allows for deterministic breeding. 

Systematic genetic dissection of quantitative traits 
using complete RFLP linkage maps is valuable in a broad 
range of biological endeavours. For example, 
agricultural traits, such as resistance to diseases and 
pests, tolerance to drought, heat, cold, and other 
adverse conditions , and nutritional value can be 'mapped 
and introgressed into domestic strains from exotic 

relatives (Rick, CM., Genes : Enzymes and Populations, 

Plenum Press, pp. 255-268 (1973); Harlan, J.R., Crop_ 
Science , l_6 : 329-333 (1976)). Aspects of mammalian 
physiology such as hypertension, atherosclerosis, 
diabetes , predispositions to cancer and teratomas , 
alcohol sensitivity, drug sensitivities and some 
behaviours can be investigated in animal strains 
differing widely for these traits (Tanase, H. e t_al_;_ , 
Japane se Circulation Journal , 34:1197-1212 (1970); 
DeJong, W. , Handbook of Hypertens ion , Vol. 4, Elsevier 

(1984) ; Paigen, B.A. et al. , Atherosclerosis , 57:65-73 

(1985) ; Prochazka, M. et_aL , Science . ,237:286-289 
(1987); He s ton, W.E., J_. Natl. Cancer Inst. , 3:79-82 
(1942); Kalter, H., Genetics, 39:185-196 (1954); 
Malkinson, A.M. & D.S. Beer, J . Nat l. Cancer Inst., 

70:931-936 (1983); Shire, J.G.M., Inj, Genetic_and 

Environmental Inf lueiices_on Behavior , pp. 194-205 (1968); 



WO 90/04651 



i 

PCT/US89/04688 



05 



10 



15 



Stewart, J. & R.C. Elston, Genetics, 73:675-693 (1973); 
Festing, M.F.W., I nb r e d_ S t r a i n s _ i n_B i o ra e d _i c a 1 _R e search , 
Oxford University Press, Oxford (1979)). Evolutionary 
questions about speciation can be elucidated by 
determining the number and nature of the genes involved 
in reproductive barriers (Coyne, J. A. & B. Char lesworth , 
Heredity., 57:243-246 (1986)). An example of such genetic 
dissection is described herein: In an interspecific 
cross in tomato, QTLs affecting fruit weight, 
concentration of soluble solids and fruit pH are mapped 
to within about 20-30 centiMorgans (cM, which = 1% 
recombination about 10 5 - 10 7 bp) by means of a complete 
RFLP linkage map. 

-^^-^-£^l^ii££i£B_£f_ihe_Draw'in£s' 

Figure 1 is the frequency distribution for fruit 
mass, soluble- solids concentration ( G Brix, a standard 
ref rac tometric measure primarily detecting reducing 
sugars, but also affected by other soluble constituents; 
1 Brix is approximately 1% w/w) and pH in the E parental 
strain and in the backcross (BC) progeny. 

Figure 2 is the distribution of recurrent parent '(E) 
genotype in the 237 backcross progeny, estimated on the 
basis of the marker genotypes and their relative 
dis tances . 

25 Figure 3 are QTL likelihood maps indicating LOD 

scores for fruit weight (solid lines and bars), soluble 
solids concentration (dotted lines and bars) and pH 
(hatched lines and bars), throughout the 862 cM spanned 
by the 70 genetic markers. The RFLP linkage map used in 
the analysis is presented along the abscissa, in Kosambi 
cM. 



20 



30 



WO 90/04651 PCT/US89/04688 



-8- 

* 

Figure 4 is a schematic drawing of phenotypic 
distributions in the A and B parental, Fl hybrid and Bl 
backcross populations. 

Figure 5 are graphic representations of LOD scores 
05 for a hypothetical quantitative trait, based on simulated 
data for 250 backcross progeny in an organism with 12 
chromosomes of 100 cM each. 

Figure 6 is a graphic representation of LOD scores 
for a chromosome containing two QTLs . 
10 Figure 7 shows the appropriate LOD threshold so that 

the chance of a false positive occurring anywhere in the 
genome is at most 5%, as a function of genome size and 
density of RFLPs scored. 

Figure 8 shows that progeny having phenotypes 
15 exceeding mean by > L standard deviations make up a 
proportion Q(L) of population, but account for a 
proportion S(L) of the total LOD score for the progeny. 

Figure 9 shows that if only individuals having 
phenotypes exceeding mean by > L standard deviations , the 
20 number of progeny genotyped may be decreased by a factor 
of g(L) if the number of progeny grown and phenotyped is 
increased by a factor of h(L). 

Figure 10 shows the number of backcross progeny that 
must be genotyped to map a QTL, based on the fraction of 
25 the backcross variance explained by the segregation of 
the QTL. In the traditional approach, all progeny are 
genotyped and single markers are analyzed. In the method 
of the present Invention, only progeny with 5% most 
extreme phenotypes are genotyped and interval mapping is 
30 used to analyze the data. 

Figure 11 shows the number of backcross progeny that 
must be genotypes to map a QTL, based on the difference D 
between the scrains (measured in environmental standard 



WO 90/04651 



I 

PCI7US89/04688 



-9- 



05 



10 



20 



25 



30 



deviations) and the number K of effective factors. In 
the traditional approach (Panel A) , all progeny are 
genotyped and single markers are analyzed.. In the method 
of the present invention (Panel B) , only progeny with 5% 
most extreme phenotypes are genotyped and interval 
mapping is used to analyze the data. 



The present invention is based on a method of 
resolving quantitative traits into discrete Mendelian 
factors, using genetic markers. In the work described 
herein, a complete RFLP linkage map was available and 
RFLPs were the genetic markers used. However, any 
genetic markers which will generally be codominant 
markers, such as DNA polymorphisms, isozymes or other 
15 codominant protein polymorphisms or a combination of such 
markers, can be used. Described herein is the first use 
of such a complete RFLP linkage map to resolve 
quantitative traits into discrete Mendelian factors, in 
an interspecies backcross of a higher plant. In the 
instant case, a series of QTLs controlling selected 
traits in the higher plant were mapped, using the new 
analytical methods described herein. The methods 
described herein, which can be applied in a similar 
manner to mapping QTLs in other plants and in animals, 
have made it possible, for the first time, to map QTLs to 
DNA intervals with a high degree of accuracy, thus 
maximizing the likelihood that a QTL of interest lies 
within an interval defined by two selected (defined) 
markers. This approach is broadly applicable to the 
genetic dissection of quantitative inheritance of 
physiological, morphological and behavioral traits in any 
higher plant or animal. 



-10- 



The method of interval mapping of the present 
invention and its application to a higher plant are 
described in detail herein. The method of interval 
mapping of the present invention can also be used to 
locate precisely the genomic regions which contain 
polygenic factors controlling a quantitatively inherited 
trait or traits. Briefly, the subject m e th od entails 
five steps or procedures: 1) choosing of a pair of 
interfertile strains (i.e., two types or varieties of a 
plant or animal which differ as to a trait of interest) , 
which will serve as the parent strains in the initial 
cross (and one of which may serve as the recurrent parent 
in subsequent crosses if backcrosses are employed) ; 2) 
constructing a genetic linkage map (using RFLPs , isozymes 
and/or other codominant markers), if an adequate map is 
not already available; 3) arranging one or more 
back-crosses or intercrosses, using as the recurrent 
parent the strain or type of plant or animal in which the 
transferred gene (or genes) is to function; 4) scoring 
progeny of the back-crosses or intercrosses for the trait 
or traits of interest and for the genetic markers 
comprising the linkage map; and 5) applying an algorithm 
designed to maximize the likelihood of a specific/select- 
ed function based on the data obtained in (A) . 

An example of application of the interval mapping 
method of the present invention to locate genomic regions 
containing QTLs of interest in tomato plants is presented 
in the following sections. 

The following is a description of: a) the parent 
plants used in the interspecies backcross; b) assessment 
of backcross progeny; c) construction of a genetic 
linkage map; d) interval mapping of QTLs; e) a summary or 



WO 90/04651 



I 

PCT/US89/04688 



-11- 



ove 



05 



10 



rview of key considerations; and f) possible 
applications of QTL mapping based on the resulting data. 

a . Iiient_£laiits 

Described below is resolution of quantitative 
traits, using a complete RFLP linkage map, in an 
interspecific back-cross of two types of tomato plants: 
the domestic tomato Ly_c_o£ejj3_i con_esculen turn (L. 
£!£HliBlH5> cv - UC82B (denoted E) and a wild South 
American green- fruited tomato L^_chmi e 1 ews ki i 
£hmielew s k i i ) accession LA1028 (denoted CL) . 
Chmielewski , T., Gene t^_P o 1 on_, , 9:97 -124 (1968 ). 

These strains have very different fruit masses (E 
approximately 65 g; CL approximately 5 g) and 
concentrations of soluble' solids (E approximately 54; CL 
15 approximately 10%). These are traits of agricultural 
importance because they jointly determine the yield of 
tomato paste. Rick, CM., H il^a r d i a , 42:493-510 (1974). 
In addition, the strains are known to be polymorphic for 
genes affecting fruit pH, which is important for the 
optimal preservation of tomato products; the difference 
in pH between parental strains is, however, small. 
Tanksley, S.D. and J. Hewitt, ^_Theor^_A£gL_Gene t^ , 
75:811-823 (1988). 



20 



25 



30 



k • 5££^£I£££_2I£Si!12_a££e£sed 

A total of 237 back-cross plants, with E as the 
recurrent parent, were grown in the field at Davis, 
California. • Between five and 20 fruit from each plant 
were assayed for fruit mass, so lub le - s o 1 i ds concentration 
( Brix) and pH , each of which showed continuous variation 
(Figure 1). Tanksley, S.D. and J. Hewitt, J^JTheor^ 



-12- 



A ppl . Ge net . , 75:811-823 (1988). °Brix is a standard 
refractometric measure which primarily detects reducing 
sugars; it is also affected by other soluble 
constituents. l°Brix is approximately 1% w/w. Example 1 
is a detailed description of the back-cross and 
assessment of backcross progeny. 

c . Construe tion q£ i genetic li.nkage_niap_ 

A genetic linkage map of tomato with more than 300 
RFLPs and 20 isozyme markers had previously been 
constructed, by analyzing 46 F2 individuals derived from 

I. escule ntum cv. VF36 x L_. £enne JLl_i i accession LA716 (E 

x P) . Tanks ley, S.D. e t al . , R. _ Proc , 18th Stadler 

Gen et . Symp . , In Press. The map is essentially complete: 
it has linkage groups covering a.11 12 tomato chromosomes, 
with an average spacing of 5 cM between markers (1 cM is 
the distance along the chromosome which gives a 
recombination frequency of one per cent) . For QTL 
mapping, a selected subset of markers spaced at 
approximately 20 cM. intervals and displaying polymorphism 
between the E and CL strains was used. These included 63 
RFLPs and five isozyme markers. In addition, the E and 
CL strains differ in two easily-scored, simply- inherited 
morphological traits: determinancy (described below) and 
uniform ripening, controlled by the sp and u genes, 
respectively. Although a few distal regions did not 
contain appropriate markers, it is estimated that about 
95% of the tomato genome was detectably linked to the 
markers used. 

These 70 genetic markers were scored for each of the 
237 E x CL back-cross progeny (as described by Tanksley 
and Hewitt) and a linkage map was constructed de novo 
using MAPMAKER. Tanksley, S.D. and J. Hewitt, J. Theor . 



-13 - 



A22l^_£ene t ± , 75:811 - 823 ( 1988 ); Lander, E.S. et_al_. , 
Genomics, 1:174-181 (1987). The map covers all 12 
chromosomes with an average spacing of 14.3 cM. Although 
the linear order of markers inferred from the E x CL 
cross essentially agreed with that inferred from the E x 
P cross, differences were noted (See Example 1). Genetic 
distances differed markedly in certain intervals (for 
example, 51 cM in E x P and 11 cM in E x CL, for the 
distance between the 45S ribosomal repeat and TG IB on 
chromosome 2). In total, the markers scored in both 
crosses span 852 cM in the E x CL map versus 1103 cM in 
the E x P map, a highly significant (P < 0.01) 
difference. Skewed segregation (P < 0.05) was detected 
for 48 of the 70 markers, comprising 21 distinct regions 
distributed over all ' 12 chromosomes. The heterozyg'ote 
(E/CL) was overabundant in 12 cases, whereas in nine 
cases the homozygote (E/E) was favoured. Overall, the 
effects of skewing approximately cancelled each other 
out: on average, the back-cross contained the expected 
75% E genome (Figure 2). 

d • l.H.l±L Z£i_S^EEi n £_£f _QT Ls 

Next the question of mapping the Mendelian factors 
that underly continuous variation in fruit mass, 
soluble-solids concentration and pH was addressed. The 
method of maximum likelihood and lod scores, commonly 
used in human linkage analysis has recently been adapted 
to allow interval mapping of QTLs . Ott, J., Analysis_of 
Human_Genetic_Linkage (Johns Hopkins, Baltimore, .1985); 
Lander, E.S. and D. Botstein, Genetics , 121 : 185 - 199 
(1989). Both were used here. At each position in the 
genome one computes the 'most likely' phenotypic effect 
of a putative QTL affecting a trait (the effect which 



-14- 



maximizes the likelihood of the observed data arising) 
and the odds ratio (the chance chat the data would arise 
from a QTL with this effect divided by the chance that it 
would arise given no linked QTL). The lod score, defined 
as the l°g 10 of the odds ratio ■ summarizes the strength 
of evidence in favour of the existence of a QTL with this 
effect at this position; if the lod score exceeds a 
pre -de termined threshold, the presence of a QTL is 
inferred. The traditional approach to mapping QTLs , as 
described by Tanksley et al . and Edwards et i al . , involves 
standard linear regression, which accurately measures the 
effect of QTLs falling at marker loci only, 
underestimating the effects of other loci in proportion 
to the amount of recombination between marker and QTL. 
Tanksley, S.D. et al ■ , Heredity, 49:11-25 (1982); 
Edwards, M..D. et al. , Ge net ics , 116:113-125 (1987). In 
contrast, interval mapping allows inference about points 
throughout the entire genome and avoids confounding 
phenotypic effects with recombination, by using 
information from flanking genetic markers. In the 
special case when a QTL falls exactly at a marker locus, 
interval mapping reduces to linear regression. A 
computer program, MA.PMAKER- QTL , was written to implement 
interval mapping. 

Due to the large number of markers tested, an 
extremely high lod score threshold must be adopted to 
avoid false positives. Given the genetic length of the 
tomato genome and the density of markers used, a 
threshold of 2.4 gives a probability of less than 5% that 
even a single false positive will occur anywhere in the 
genome. Lander, E.S. and D. Botstein, Genetics, 
12 1 : 185-199 (1989). This is approximately equivalent to 



-15- 



requiring the significance level for any single test to 
be 0.001. 

QTL likelihood maps , showing how lod scores for 
fruit mass, s o 1 ub 1 e - s o 1 i d s concentration and pH change as 
one moves along the genome, reveal multiple QTLs for each 
trait and estimate their location to within 20-30 cM . 
(Figure 3) . 

Factors for fruit mass were found on six chromosomes 
(1, 4, 6, 7, 9 and 11). In each case, CL alleles 
decrease fruit mass (by 3.5 to 6.0 g) , adding to a total 
reduction of 28.1 g inferred for back-cross progeny 
carrying a CL allele at all six loci. This accounts for 
about half of the approximately 60 g difference between E 
and CL. 

Factors for soluble- solids concentration were found 
on four chromosomes (3, 4, 6 and 7). In each case, CL 
alleles elevate soluble-solids concentration (by 0.83 to 
1.89 °Brix) , adding to a total of 4.57 °Brix (versus a 
difference of approximately 5° Brix between the parental 
strains). This large effect in the backcross is 
consistent with previous reports that high soluble - solids 
concentration exhibits dominance and ove r dominanc e . 
Rick, CM., Hil^ardia, 42:49 3-510 (19 74); Tanks ley, S.D. 
and J. Hewitt, J^_The £r_ L _A££l_^_G ene t_. , 75:811-823 (1988). 
The QTL alleles for both fruit mass and so luble - s o 1 i ds 
concentration all produce effects in the direction 
predicted by the difference between the parental strains. 

Factors for pH were found on five chromosomes (3, 
6,7, 8 and 10). In addition, the lod score for a 
putative QTL on chromosome 9 fell just below the 
threshold. Because the parental strains do not differ 
greatly in pH, it was suspected that CL alleles might not 
all produce effects in the same direction. In fact, pH 



-16- 



was increased by four QTLs and decreased by two, 
including the likely QTL on chromosome 9. This provides 
a genetic explanation for the observation that many 
back-cross progeny exhibited more extreme phenotypes than 
the parental strains (Figure 1) , a phenomenon known as 
transgression. Simmonds, N.W. Principles of Crop 
Improve ment , 82-85 (Longman, NY, 1981). 

Together, the QTLs identified for fruit mass, 
soluble-solids and pH account for 58%, 44% and 48%, 
respectively, of the phenotypic variance among the 
back-cross progeny, with another 13%, 9% and 11% 
attributable to environment. 

The numbers of QTLs reported for each trait must be 
considered a minimum estimate. Because an extremely 
stringent threshold was used' to avoid any false 
positives, some sub -threshold effects probably represent 
real QTLs . For example, the regions near TG19 on 
chromo s ome 1, CD41 on chromosome 5 and TG68 on chromosome 
12 may affect soluble-solids concentration and merit 
further attention in larger populations. Similarly, the 
region near the p locus on chromosome 10 may contain an 
additional QTL affecting pH (See Example 1). Moreover, 
one cannot rule out the presence of many additional QTLs 
with tiny phenotypic effects, which has been postulated 
in evolutionary theory and supported by some experimental 
evidence. Lande , R . , Heredity, 50:47-65 (1983); 
Shrimpton, A.E. and A. Robertson, Genetics , 11 : 445-459 
(1988). Also, it is conceivable that some of the 
apparent QTLs actually represent several closely- linked 
QTLs, each with small phenotypic effects in the same 
direction. Such a phenomenon might arise particularly in 
region's of genetic map compression. Finally, the QTL 
mapping here applies strictly only to the specific 



WO 90/04651 



I 



I 

PCT/US89/04688 



-17- 



05 



environment tested and to heterozygosity for CL alleles. 
In principle, homozygosity for CL alleles could have been 
studied by using an self between E and CL, but in 
practice too many of the progeny are sterile. 

Some regions of the genome clearly exert effects on 
more than on trait (for example, chromosome 6; Figure 3), 
providing a genetic explanation for at least some of the 
correlation between the traits. Although the present 
data are insufficient to distinguish between pleiotropic 
10 effects of a single gene and independent effects of 
tightly-linked loci, the frequent coincidence of QTL 
locations for different traits makes it likely that at 
least some of the effects are due to pleiotropy. 

The region near sp on chromosome 6 has the largest 
15 effect's on soluble solids and pH, as well as a 

substantial effect on fruit mass. The sp gene affects 
plant-growth habit: the dominant CL allele causes 
continuous apical growth (indeterminate habit), whereas 
the recessive E allele causes termination in an 
20 inflorescence ('determinate' or ' self -pruning ' habit). 
Yeager, A.F., J^__He r e , 18:263-265 (1927). Although 
indeterminancy has been reported. previously by Emery et 
al^ to elevate both fruit mass and soluble -solids 
concentration within L^_e s c ul en turn , it is associated here 
25 with reduced fruit mass in both E x CL and another 

interspecific cross (Ex L^_c he e£in an i i ) . Emery, G.C. and 
H.M. Munger, J^_Am^_S o c^_H o r t^_S c i_ , 95:410-412 (1966). 
These differing results might be due to a second, 
tightly- linked locus or to unlinked modifier genes. 

Overall, pairwise epistatic interactions between 
intervals were not common (about 5% of two-way 
analysis-of-variance tests were significant at 0.05). 



30 



J 

WO 90/04651 



I 

PCI7US89/04688 



-18- 



An interesting exception was the region near TG16 on 
chromosome 8, at which the CL allele significantly- 
enhanced the effect of three of the four QTLs for 
soluble-solids concentration. TG16 also showed the most 
05 extreme segregation distortion of any marker scored 

(about 4:1 in favour of the E/E homozygote) and is in a 
region known to exhibit skewed segregation in 
back-crosses to other green-fruited tomato species. 
Zamir, D. and Y. Tadmor, Bot. Gaz., 147:355-358 (1986); 

10 Tanks ley., S.D. In: Isozymes in P lan t .Gene tics _ and 

Breeding, (eds. Tanksley, S.D. and T.J. Orton) 331-338 
(Elsevier, Amsterdam, 1983). The unusual properties of 
this region of CL clearly merit further study. 

The QTLs identified here may well differ from those 
15 that' would be fixed by repeated back - cros sing with - 

continuing selection for a trait, a classical method for 
introgressing quantitative traits. Work on LA1563, a 
strain with increased soluble solids produced through 
back-crossing a different strain of E to CL has provided 
20 some suggestive evidence. Rick, CM., Hil gardia , 

42:493-510 (1974). By surveying RFLPs , Tanksley and 
Hewitt recently found that LA1563 has maintained three 
separate regions from CL: near CD56 on chromosome 10, 
near Got2 on chromosome 7 and near TG13 on chromosome 7. 
25 Here, above- threshold effects were detected in the last 

of these three regions only (which, interestingly, failed 
to show effects on soluble solids in a s ingle - environment 
test by Tanksley and Hewitt). Tanksley, S.D. and J. 
Hewitt, J. Theor. Appl. Gen et. , 75:811-823 (1988). 
30 Moreover, QTLs affecting soluble- solids concentration 
were detected in regions that did not seem to be 
retained. Unfortunately, the results of the two 
experiments are not directly comparable due to the use of 



WO 90/04651 



I 

PCI7US89/04688 



-19- 



a different strain by Rick, possible environmental 
differences between the experiments, the possibility that 
small CL fragments containing QTLs went undetected in 
LA1563, the possibility that the region near TG13 
05 retained in LA.1563 may not contain the QTL detected here 
and the possibility that some of the sub - thr e shold 
effects are real. Although more detailed studies are 
clearly needed, it is interesting to speculate about why 
repeated back- cros s ing may fix a narrower class of QTLs 
10 than found by QTL mapping. Because such breeding 

programs demand horticultural acceptability, they are 
likely to select against o therwis e - de s ir able QTLs which 
are closely linked to undesirable effects from the wild 
parent. If such QTLs can first be identified by mapping 
15 it may be feasible to remove linked deleterious effects 
by recombination. 

Once several QTLs with relatively large effects have 
been mapped, crosses can be used to isolate them- in 
near - isogenic lines. These lines can be used to 
20 characterize the QTLs in various dosages, genetic, 
backgrounds, environments and combinations. By 
re- assembling selected CL alleles in an otherwise E 
genotype, it should, be possible to engineer an 
agriculturally-useful tomato with a higher yield of 
25 soluble solids. 

e . 2Z®£Zi!H_£l_.Kf Y_££2£id_£££t: ions 

Although it has long been recognized that 
quantitative traits often arise from the combined action 
of multiple Mendelian factors, only recently has it 
30 become practical to undertake systematic mapping of such 
QTLs. While such investigations will by no means be 
easy, the methodology developed here should increase 



-20- 



their accuracy and efficiency. Specifically, by 
integrating information from genetic markers spaced 
throughout a genome, the method of interval mapping 
described above allows (i) efficient detection of QTLs 
while limiting the overall occurrence of false positives; 
(ii) accurate estimation of phenotypic effects of QTLs; 
and (iii) localization of QTLs to specific regions 
(Figure 6). Beyond the increased efficiency due to 
interval mapping, the strategy of selective genotyping 
(when applicable) can further reduce the number of 
progeny that must be genotyped in order to detect a QTL. 
Together, the methods lead to a reduction of up to 7-fold 
in the number of progeny to be genotyped. Finally, 
additional savings may be achieved via progeny testing 
and simultaneous search. The main considerations in 
designing a cross for genetic dissection of a 
quantitative trait are as follows: 

1. D esigni ng across: Strains can be chosen to 
maximize the chance that they segregate for QTLs having 
relatively large phenotypic effects, thereby allowing 
mapping with a manageable number of progeny. The ideal 
situation occurs when (a) the phenotypic difference D 
between the strains is large compared to the 
environmental or within- strain standard deviation (b) 
breeding experiments indicate that the number k of 
effective factors given by Wright's formula is small; and 
(c) the strains are the result of selective breeding for 
the trait. 

^ * Sp.ecifying the minimum phenotypic effect the cross 
wi ll be designed to detect: Once the strains have been 
chosen, the experimenter must specify the minimum 
phenotypic effect 8 that the cross will be designed to 
detect. When using strains resulting from selection, a 



WO 90/04651 



I 



PCT/US89/04688 



-21 - 



choice of S in the range of between h(D/k) and (D/k) 
should ensure that QTLs accounting for much of the 
phenotypic difference will be detected. When using 
arbitrary strains, the same choice of S can be used, 
05 although the presence of QTLs with this effect is not 
guaranteed. 

3 . Calculating^he_number_of _E££k£r££s_£rj>geny_to_be 
Eli££££Y.E££- : The number N of backcross progeny that 
should be genotyped can then be calculated based on the 

10 spacing d between genetic markers in the map, the 
appropriate threshold T for the LOD score, and the 
desired probability £ of success, assuming either (i) the 
traditional method of analysis involving single markers 
and genotyping all progeny or (ii) interval mapping and 

15 selective genotyping of the '5% most extreme progeny. 

Figure 10 shows N as a function of the fraction of 

variation v explained by the QTL (where v - 5 2 /l 6a 2 _ . ) , 

B 1 

while Figure llab shows N as a function of the phenotypic 
difference D between the strains and the number k of 
20 effective factors. Together, interval mapping and 

selective genotyping reduce the number of progeny to be 
genotyped by up to 7-fold. Both Figures 10 and 11 assume 
that d = 20 cM, T - 2.5, and 0 - 0.50, and Figure 11 
assumes that the QTLs have equal phenotypic effects. For 
instances in which different assumptions are made, the 
following modifications are made: multiply by 4 to allow 

for QTLs having half the average effect; multiply by 

2 

approximately ( 1 . 25 ) ( 1 - 2 8 ) /l - 0 ) to allow for markers 
every d cM .( 9 and x/> are the recombination fractions 
corresponding to Hd and d cM, respectively; multiply by 
approximately 1.50 to allow for a 90% chance of success; 
multiply by approximately 1.50 to allow for a 90% chance 
of success ; multiplyj by T/2 . 5 to allow for a low LOD 



25 



30 



-22- 



threshold of T; and multiply by about 55% if an F2 
intercross is used instead of a backcross. As a rule of 
thumb, it appears practical to map QTLs when the 
phenotypic difference D measured in environmental 
standard deviations is on the order of the number k of 
effective factors segregating. 

The Spontaneous Hypertensive rat (SHR) strain 
(Tanas e, H. et al. , Japan. Circulation Journal, 
34:1197-1212 (1970)), was derived from the Wistar-Kyoto 
rat (WKY) strain by selective breeding for high systolic 
blood pressure followed by inbreeding. Blood pressure in 
SHR is about 3 standard deviations higher than in WKY, 
while the number of effective factors was estimated at k 
about 3 . Assuming that the rat genome is about 1500 cM 
and that a 20 cH map RFLP is available ,' the appropriate 
LOD threshold would be about 2.7 (see Figure 7). Using 
the traditional approach, one would need about 325 
backcross progeny or about 175 F2 intercross progeny. 
With interval mapping, these become about 275 and 145. 
If it were practical to grow a larger population but 
genotype only those progeny with the 5% most extreme 
blood pressures, the number of progeny to genotype could 
be reduced to about 55 and 30, respectively. 

In addition to SHR, a number of other genetically 
hypertensive strains of rat and mouse have been desired, 
with estimated number of factors between 2 and 5 (DeJong 
1984) . Comparison of these strains would elucidate the 
number and location of the most important genes 
controlling naturally- occuring variation for blood 
pressure--at least in rodent population. Such 
information might shed light on hypertension in humans as 
well . 



WO 90/04651 



I 



PCT/US89/04688 



- 23- 



05 



The availability of complete RFLP linkage maps makes 
it possible to dissect quantitative traits into discrete 
genetic factors, thereby unifying two 

historically- s eparated areas of genetics. Once QTLs have 
been mapped, isogenic lines can be rapidly constructed 
differing only in the region of the QTL by using the 
RFLPs to select for the desired region and against the 
remainder of the genome. Soller, M. & J.S. Beckmann, 
I£3 or . App_l . gene t^ , 47:17 9-190 (1983); Paters on, A.H. et 
10 5.L.. Submitted, 1988. 

f . A££l^c a t i on_o f _QTL_Map.2illg 

The general approach of QTL mapping is broadly 
applicable to a wide range of biological endeavors . 
For example,' in agriculture, it might " be des irable to 

15 transfer to domestic strains many quantitative traits 
harbored in wild species, including resistance to 
diseases and pests, tolerance to drought, heat, cold and 
other adverse conditions, efficient use of resources and 
high nutritional quality. Rick, C.M., Inj_ Genes x 

20 E nz^me_:> and Po_p_u^a1^io_ns_ (Ed. A.M. Srb) 255- 268 (Plenum, 

NY, 1973); Harlan, J.R., C r o£_S c L , 16:329-333 (1976). 
In mammalian physiology, se.lective breeding has generated 
rodent strains which differ greatly in quantitative 
traits, such as hypertension, atherosclerosis , diabetes, 

25 predispositions to cancer, drug sensitivities and various 
behavioural patterns. Information on the number, 
location and nature of these QTLs would be of value in 
medicine. Fest-ing, M.F.W., Inbred Strains in Biomedical 
l!i!!I£!!i (Oxford, 1979). In evolutionary biology, the 

30 process of speciation can be investigated by studying the 
number and nature of genes underlying reproductive 



-24- 



isolation. Coyne, J. A. and B. Charlesworth , Hered it y , 
57:243-246 (1986). 

The availability of detailed RFLP linkage maps makes 
it possible to dissect quantitative traits into discrete 
genetic factors (QTLs) : all regions of a genome can be 
assayed and accurate estimates of phenotypic effects and 
genetic position derived from interval analysis. 
Tanksley, S.D. et al . , R. Proc, 18th S tadler_Genet^ 
Symp . , In Press; Helentjaris, T . , Tr ends_i n_G e ru , 
3(8): 217 -221 (1987); Landry, B.S. et a l. , Theor. Appl. 
Gene t. , 74:646-65 3 (19 87); Burr, B. et al . , Genetics , 
118:519-526 (1988); Chang, C. et al. , P roc eedings of the 
National A cademy of Scie nces, U.S.A. , In Press; McCouch , 
S.R. et al. , Theor. Appl. Genet., In Press; Kosambi, 
D.D., Ann. Eugeri. , 12:172-175 (1944). Once QTLs are 
mapped, RFLP markers permit genetic manipulations such as 
rapid construction of near-isogenic lines: flanking 
markers may be used to retain the QTL and the study of 
the remaining markers may be used to speed progress by 
identifying individuals with a fortuitously high 
proportion of the desired genetic background (see Example 
1) . Using isogenic lines, the fundamental tools of 
genetics and molecular biology may be brought to bear on 
the study of QTLs, including testing of complementation, 
dominance and epis.tasis; characterization of 
physiological and biochemical differences between 
isogenic lines; isolation of additional alleles by 
mutagenesis (at least in favourable systems); and 
physical mapping and molecular cloning of genetic factors 
underlying quantitative traits . 



WO 90/04651 



) 



PCT/US89/04688 



-25- 



05 



10 



15 



20 



25 



30 



EXAMPLE 1 Inter I£ec ie£_Bac k^cr o£s_and_A£^ 

The tomatoes were grown in the field at Davis, 
California, in a completely randomized design including 
237 BC plants (with E as the recurrent pistillate 
parent), as well as E, CL and the F. are contr o Is . 
Neither CL nor the F^ progeny matured completely, as is 
typical in the central valley of California. Among the 
BC plants, six failed to mature and 12 produced too few 
fruit to assay reliably for quantitative traits. The 
absence of quantitative trait data for these few progeny 
should yield at most a slight bias in the analyses. The 
frequency distribution for fruit masses, soluble solids 
concentration (°Brix, a standard r ef r ac tome t r i c measure 
primarily detecting reducing sugars, but also affected by 
other soluble constituents; l°Brix is approximately 1% 
w/w) and pH in the E parental strain and in the backcross 
(BC) progeny are shown in Figure 1. Means and standard 
deviations for the distributions of the E parental strain 
(E filled bars) and the BC progeny (BC open bars) appear 
in the upper right of each histogram. The distributions 
for soluble- solids concentration and pH are approximately 
normal. The distribution of the BC progeny for fruit 
weight is clearly, skewed; log 1Q (fruit mass) was studied 
throughout to achieve approximate normality (E - 1.81 ± 
0.07; BC - 1.20 ± 0.19). Wright, S., Evo lut ion_and_the 
££Sl£ics_of .Populations , (Univ. of Chicago, Chicago, 
1968). The proportion of variance due to environment was 
estimated to be the square of the ratio of the standard 
deviations (E/BC), for log-mass, solids and pH. Figure 2 
shows the distribution of percentage of recurrent parent 
(E) genotype in the 237 back-cross progeny, estimated on 
the basis of the marker genotypes and their relative 
distances. Determination of marker genotypes wis as 



-26- 



previously described. Tanksley, S.D. and J. Hewitt, 
Theor. Appl. Genet,,, 75:811-823 (1988). Estimates of the 
percentage of recurrent parent genome were produced by 
the recently-developed computer program HyperGene™ . 
Although the average agreed closely with the Mendelian 
expectation of 75% for a back-cross, values for 
individual plants ranged from 59% to over 90%. The 
distribution of the proportion of recurrent -p arent genome 
agrees with the mathematical expectation. Franklin, 
L.A. , Theor^Populat . Biol., 11:60-80 (1977); Stam, P., 
Genet. Res. , 25:131-155 (1980). The individual with >90% 
E appears to carry only five fragments from CL (ranging 
from 9 to 47 map units in length) and could be returned 
to essentially 100% E with two additional back-crosses of 
about 550 plants. This is far more rapid than the 6-8 
back-crosses routinely used to eliminate donor genome in 
the absence of markers . 

QTL likelihood maps indicating lod scores for fruit 
mass (solid lines and bars), soluble- solids concentration 
(dotted lines and bars) and pH (hatched lines and bars), 
throughout the 862 cM spanned by the 70 genetic markers 
are shown in Figure 3. The RFLP linkage map used in the 
analysis is presented along the abscissa, in Kosambi cM. 
Kosambi, D.D., An n. E ugen. , 12:172-175 (1944). The order 
of the markers agrees with the previously-published map 
of the E x P cross, except for three inversions of 
adjacent markers : (TG24-CD15), (TG63-CD32B) and 
(TG30-TG36) . Tanksley, S.D. et_aL, R. Proc. 18th 
Stadler Genet. Symp., In Press. In the first case, 
re-analysis of the E x P data with MAFMAKER indicates 
that the order shown here is the more likely order, in 
both E x P and E x CL. Lander, E.S. et al . , Genomics, 
1:174-181 (1987); Proc . N atl. Acad. Sci. , USA , 
84:2363-2367 (1987). For the other two, the orders shown 



WO 90/04651 



I 



PCT/US89/04688 



-27- 



here are more likely in E x CL by odds of 10 : 1 and 
10^:1, but the inverse is more likely in E x P by 1.1:1 
and 8:1 odds. These differences will be investigated in 
a larger E x P population. S o lub le - s ol ids concentration 
05 and pH were analyzed in °Brix and pH units, respectively; 
allele effects on fruit mass are presented in g; log 
transformation of fruit mass was used in all analyses to 
achieve approximate normality. The maximum likelihood 
effect of a putative QTL, as well as the lod score in 

10 favour of the existence of such a QTL, have. been 

determined at points spaced every 1 cM throughout the 
genome, according to the method described herein and a 
smooth curve plotted through the points. The height of 
the curve indicates the strength of the evidence (log 1Q 

15 of the odds ratio) for'the presence of a QTL at each 

location and not the magnitude of the inferred allelic 
effect. The horizontal line at a height of 2.4 indicates 
the stringent threshold that the lod score must cross to 
allow the presence of a QTL to be inferred, as described 

20 herein. 

Information about the likely position of the QTL can 
also be inferred from the curve. The maximum likelihood 
position of the QTL is the highest point on the curve. 
Bars below each graph indicate a 10:1 likelihood support 

25 interval for the position of the QTL (the range outside 
which the likelihood falls by a lod score of 1.0), 
whereas the lines extending out from the bars indicate a 
100:1 support interval. Phenotypic effects indicated 
beside the bars are the inferred effect of s ub stituting a 

30 single GL allele for one of the two E alleles at the' QTL. 

Several regions show sub- threshold effects on one or 
more traits (chromosome one near TG19, chromosome five 
near TG34 and chromosome 12 near TG68) which may 



WO 90/04651 



PCT/US89/04688 



-28- 



represent QTLs ; this requires additional testing. The 
region near TG68 may be particularly interesting, as it 
is the only instance found where the CL allele seems to 
decrease soluble-solids concentration (by about 

05 0.7°Brix). In the case of chromosome 10, the lod score 
for pH crosses the significance threshold in two places. 
Controlling for the presence of a QTL near CD34A, testing 
for the presence of a second QTL near p was carried out, 
by comparing the maximum lod scores assuming the presence 

10 of only the first QTL to the maximum lod score assuming 
the presence of two QTLs) . Allowing for a QTL in the 
region of CD34A. , the residual lod score near p falls 
below the required threshold. Thus, the evidence is not 
yet sufficient to support the presence of a QTL near p. 

15 Methods. The lod score and the maximum likelihood 

estimate (MLE) of the phenotypic effect at any point in 
the genome is computed assuming that the distribution of 
phenotypes in the BC progeny represents a mixture of two 
normal distributions (of equal variance) with means 

20 depending on the genotype at a putative QTL at the given 
position. (Note that QTLs are considered individually 
and there is no assumption that different QTL effects can 
be added, except in studying the possibility of two QTLs 
on chromosome 10 affecting pH. Specifically, at a given 

25 position in the genome, the likelihood function for 

individual i with quantitative phenotype <j> is given by 

2 -h 2 2 

L (a,cr) - (2jra ) (p exp(-^) la ) + p exp 

2 2 ' 
(-(^)-ct) /2a ), where a is the effect of substituting a 

CL allele for an E allele at a putative QTL in the given 
2 

30 position, a is the phenotypic variance not attributable 
to the QTL and p 1 and p,, are the probabilities that 
individual i has genotype E/E and E/CL, respectively, at 



WO 90/04651 



) 



i 

PCT/US89/04688 



-29- 



05 



the QTL (which can be computed on the basis of the 
genotypes at the flanking markers and the distance to the 
flanking markers). The likelihood function for the 
entire population is L - 11 L^ . Also a and a denote 
the MLEs allowing the possibility of a QTL at the 
location (the values which maximize L) and a denotes 
the MLE of a, subject to the constraint that no QTL is 
linked (a - 0). The lod score is then given by log.. ' 
L(a , a )/L(0, a ). This method for QTL mapping is 
10 developed more fully in Example 2. 

EXAMPLE 2 Arialjtical_Methods_Used_in_Mao£i^ 
Iiai_ts_Us inj5_RFLl_Linkage_Ma2s 
The following is a description of a set of 
analytical methods that modify and extend the classical ■. 
15 theory for mapping discrete Mendelian factors underlying 
quantitative traits, referred to as quantitative trait 
loci (QTLs). These include: (i) a method of identifying 
promising crosses for QTL mapping by exploiting a 
classical formula of Sewall Wright; (ii) a method 
(interval mapping) for exploiting the full power of RFLP 
linkage maps by adapting the approach of LOD score 
analysis used in human genetics, to obtain accurate 
estimates of the genetic location and phenotypic effects 
of QTLs; and (iii) a method (selective genotyping) that 
allows a reduction of up to 7-fold in the number of 
progeny that need to be scored with the DNA markers. 
Figures 10 and 11. are graphs that allow geneticists to 
estimate, in- any particular case, the number of progeny 
required to map QTLs underlying a ' quantitative trait. 



20 



25 



30 



( i) Ii£Siili£££i°!!_£f_EIOHis in£_crosses_f or_QTL 
S£L££iSS- Genetic dissection of a quantitative trait will 



WO 90/04651 



PCI7US89/04688 



-30- 



succeed only when some of che QTLs segregating in the 
cross have relatively large phenotypic effects. It has 
been shown, through use of a classical formula of Sewall 
Wright, that it is often possible to recognize such 
05 crosses in advance and thereby to ensure that QTLs will 
in fact be identified. 

The basic methodology for mapping QTLs involves 
arranging a cross between two inbred strains differing 
substantially in a quantitative trait: segregating 
10 progeny are scored both for the trait and for a number of 
genetic markers. Typically, the segregating progeny are 
produced by a Bl backcross (Fl x Parent) or an F2 
intercross (Fl x Fl) . For simplicity, only the backcross 
will be discussed in detail. As noted below, the F2 
15 intercross is completely analogous and requires only 
about half as many progeny. 

Definitions and assumptions: Let A and B be inbred 
strains differing for a quantitative trait of interest, 
and suppose that a Bl backcross is performed with A as 
20 the recurrent parent. Let 

CP A .^ 2 A ), C/V* 2 b )- (/J .Fl' ff2 Fl > and ( "Bl' a2 Bl ) 

denote the mean and variance of the phenotype in the A, 
B, Fl and Bl populations, respectively (see Figure 4). 
Let D - p -/j > 0 denote the phenotypic difference 

JJ A 

25 between the strains, The cross will be analyzed under the 
classical assumption that the phenotype results from 
summing the effects of individual QTL alleles, and then 
adding normally- distributed environmental (i.e., 
non-genetic) noise. (Mather, K. and J.L. Jinks, 

30 Bibm e tric a l Ge netics, Cornell University Press, Ithaca, 
NY (1971); Falconer. D.S., Incroduction_co_Ouantitative 



WO 90/04651 



I 



i 

PCT/US89/04688 



-31- 



05 



15 



20 



25 



£lli££i£l> Longman, London (1981)). In particular, 
complete codominance and no epistasis are assumed. These 
assumptions imply that: 

"Fl " 4(/i A + V ' (la > 

"gci - j ( "a Y"'' r d (lb) 

ff A _17 B- a Fl <C7 BCl' < lc > 

The variances within the A, B and Fl populations equal 

2 

the environmental variance, a , among genetically 

identical individuals, while the variance within the Bl 

10 progeny also includes genetic variance ct 2 - - er 2 . 

G B 1 E 

Frequently, phenotypic measurements must be 

mathematically transformed so that parental phenotypes 

are approximately normally distributed and the relations 

(labc) are approximately satisfied. For example, Wright 

(1968) obtained an excellent fit to the theory by 

applying a log - trans formation (appropriate when variances 

scale with the mean) to tomato fruit weight. 

By the phenotypic effect 5 of a QTL, is meant the 

additive effect of substituting both A alleles by B 

alleles. A single allele has effect hS , since 

codominance is assumed. In a backcross, the segregation 

of a QTL with effect 6 contributes an amount 5 2 /16 to the 

2 

genetic variance a The variance explained by the QTL 

2 2 

is written a =- S /16, while the residual variance is 

2 2 p 2 

a — c „ - - a 

res Bl exp 

The ability to map QTLs underlying a quantitative 
trait depends on the magnitude of their phenotypic 
effect: the smaller the effect that one wishes to 
detect, the more progeny will be required. Before 



30 



WO 90/04651 



PCT/US89/04688 



-32- 



attempting genetic dissection of a quantitative trait, it 
would thus be desirable to identify crosses segregating 
for QTLs with relatively large phenotypic effects and to 
estimate the magnitude of the effects. In fact, this can 
05 often be accomplished by exploiting a classical formula 
of Wright. 

Wright (quoted by Castel 1921; Wright 1968) proved 
that the number of QTLs segregating in a backcross 
between two strains with phenotypic difference D can be 
10 estimated by the formula: 

k - D 2 /16a 2 Q (2) 

provided that the following assumptions hold: (i) the 
•QTLs have effects of equal- magnitude ; (ii) the QTLs are 
unlinked; and (iii) the alleles in the high strain all 

15 increase the phenotype, while those in the low strain 
decrease the phenotype. The estimate k is called the 
number of effective factors in the cross. If the 
assumptions are satisfied, then each QTL affects the 
phenotype by (D/k) and explains (1/k) of the genetic 

20 variance in the backcross . 

Unfortunately, if these assumptions are not 
satisfied (as will be likely in practice) , the number of 
effective factors k may seriously underestimate the 
number of QTLs. In principle, the number of QTLs is 

25- unlimited. In this case, must there exist any QTLs 

affecting the phenotype by (D/k)? More generally, for 
any 0 < e < 1, must there exist QTLs affecting the 
phenotype by e(D/k)? And, how must of the total 

2 

phenotypic difference D and the genetic variance a ^ can 
30 be attributed such QTLs? Proposition 1 (proven in 
Appendix [Al]) supplies an answer. 



WO 90/04651 



I 



I 

PCT/US89/04688 



-33- 



£r opos_i. t i on^l^ 

Consider a cross in which the phenotype difference 
between the strains is D and the number of effective 
factors is k. Assume that the QTLs are unlinked and that 

05 the alleles in the 'high' strain all increase the 

phenotype. No matter how many QTLs are segregating and 
no matter what their individual phenotype effects , the 
sets of QTLs tha.t alter the phenotype by at least e(D/k) 
must together account for a fraction > D of the total 

10 phenotypic difference D between Che strains and must 

together explain a fraction > V of the genetic variance 
in the second generation, where 

D £ - [hi + (l-f)k+k £ 2 ]/k and V - 1 - e(l-D ). 

» 

Considering the case e-1, the proposition states 
15 that the QTLs with phenotypic effect (D/k) must account 
for a phenotypic difference of at least (D/k) . In other 
words, there must exist at least one QTL having 
phenotypic effect >(D/k). 

Consider the search for QTLs with somewhat smaller 
20 effects. How much of the phenotypic difference can be 
attributed to QTLs with effect > h(D/k)? Taking e ■= h 
and considering various values of k, results in the 
following : 



WO 90/04651 



PCT/US89/04688 



-34- 



Minimum proportion of 
phenotypic difference 
D accounted for by QTLs 
with effect >■ 4(D/k) 



Minimum proportion of 

2 

genetic variance a 
explained by QTLs 
with effect >: *i(D/k) 



05 



2 
3 
4 
5 



64% 
50% 
42% 
37% 



82% 
75% 
71% 
69% 



A small value of k thus implies that the cross must be 
10 segregating for QTLs with relatively large effects 

(> H(D/k)), which together account for a substantial 
proportion of the phenotypic difference and explain a 
substantial proportion of the genetic variance in the 
backcross. 

15 In other words, Wright's formula can be used to 

indicate the presence of some QTLs with large effects - - 
even though the number k of effective factors may not be 
a reliable estimate of the total number of QTLs. Note 
that Proposition 1 provides only worst-case lower bounds: 

20 in general, the QTLs with large effects will have an even 
greater effect. 

How serious a limitation is posed by the two . 
assumptions remaining in Proposition 1? 

(i) The first assumption is not essential: 

25 admitting the possibility of linked QTLs simply allows 
that some large QTL effects may eventually prove to be 
due to several nearby genes. Such questions may be 
safely neglected at first. 

(ii) The second assumption is more important. 

30 Fortunately, it is possible to choose crosses in which it 
is likely to be satisfied. The ideal situation would be 



WO 90/04651 



) 



; 

PCTVUS89/04688 



-35- 



05 



10 



30 



two strains arising from brief, intense artificial 
selection for and against the trait in an outbr.ed 
population, followed by inbreeding: in such a case, 
classical selection theory shows that a "high" strain is 
unlikely to fix a "low" allele at QTLs with relatively 
large effect; moreover, the force of selection will be 
greatest on the QTLs with the largest effects. (Falconer, 
D . S . , Intro due t i on_ t o_Quan t i tati ve_Genetics , Longman , 
London (1981)). Many such strains have been developed to 
study various physiological traits. As a reasonable 
alternative, one could use strains that appear to have 
resulted from natural selection for the trait. 

Judicious choice of strains can essentially assure 
that some QTLs will be detected in a reasonable progeny 
15 size that can be calculated in advanced. When studying 
strains resulting from selection, a reasonable approach 
to mapping QTLs would be to use enough progeny to map 
QTLs having effect. S between *i(D/k) and (D/k). Of 
course, one could choose to study more progeny and might 
well be rewarded with the detection of QTLs with smaller 
effects . 

Unselected strains exhibiting extreme phenotypic 
difference may also merit attention: QTLs with large 
effects may well be segregating, despite the lack of a 
mathematical guarantee. When there is. clear evidence of 
both high and low alleles within a strain--.as when many 
segregating progeny exhibit phenotypes more extreme than 
either parent- -the analysis • above does not apply; the 
detection level for QTLs must be' chosen somewhat 
arbitrarily. When there is no such evidence, one might 
proceed as above in choosing a progeny size. Although 
the existence of | QTLs having a given effect is no longer 



20 



25 



WO 90/04651 PCT/US89/04688 



-36- 



assured, the methods described below for detecting QTLs 
in a cross are unchanged. 

Assuming that the desired detection level S has been 
chosen, next considered are the method for mapping QTLs 
05 and the number of progeny required. 

( i i ) Exp lo it the full power o f _comp le te_l Inka^e 
map s . The traditional approach to mapping QTLs involves 
studying single genetic markers one - at - a- time . Sax, K. , 
Genetics , 8:552-560 (1923; Soller, M. and T. Brody, 

10 Theor . A ppl. Ge net. , 47:35-39 (1976). In general, the 
drawbacks of the method include that (a) the phenotypic 
effects of QTLs are systematically underestimated, (b) 
the genetic locations of QTLs are not well resolved 
because distant linkage cannot be distinguished from 

15 small phenotypic effect, and (c) the number of progency 
required for detecting QTLs is larger than necessary. 
Adapting the method of LOD scores used in human genetic 
linkage analysis, it has been possible to remedy these 
problems by the approach of interval mapping of QTLs. In 

20 addition, the traditional approach neglects the problem 

that testing many genetic markers increases the risk that 
false positives will occur. As described below, the 
appropriate degree of statistical stringency to prevent 
such errors in mapping QTLs has been determined. 



25 Tradi tio na l Approach to Mapping_QTLs 

The traditional approach for detecting a QTL near a 
genetic marker involves comparing the phenotypic means 
for two classes of progeny: those with genotype marker 
AB , and those with marker genotype AA. The difference 

30 between the means provides an estimate of the phenotypic 
effect of substituting a B allele for an A allele at the 



-37- 



QTL. To test whether the inferred phenotypic effect is 
significantly different from 0, one applies a simple 
statistical test- - amounting to linear regression or 
one-way analysis of variance, under the assumption of 

normally-distributed environmental variance. 

2 

Consider a QTL that contributes a to the genetic 

exp ° 

variance. Supposing that such a QTL were located exactly 
at a marker locus, the number of progeny required for 
detection would be approximately 

(Z ) 2 (o 2 /a 1 ) , (3) 
a res' exp ' ' K ' 

where this progeny size affords a 50% probability of 

detection if such a QTL is actually present and a 

probability' a' of a false positive' if 'no QTL is linked. 

Here, Z q is defined by the equation Probability (z > Z ) 

- a where .z is a standard normal variable (i.e., Z is 

a 

the number of standard deviations beyond which the normal 
curve contains probability a) . Soller and Brody suggest 
allowing a false positive rate of a = 0.05. Soller, M. 
and T. Brody, Theor^_Ap_£L_Genet^ , 47:35- 39 (1976). For 
a given false positive rate, the required progeny size 
thus essentially scales inversely with the square of the 
phenotypic effect of the QTL or, equivalently , inversely 
with the variance explained. 

Although it captures the key features of QTL 
mapping, the traditional approach has a number of 
shortcomings : 

(i) Tf the QTL does not lie at the marker locus, 
its phenotypic effect may be seriously underestimated. 
If the recombination fraction is 6, the inferred 
phenotypic effect of the QTL is biased downward by a 
factor of (1-20) . 



-38- 



(ii) If the QTL does not lie at the marker locus, 

substantially more progeny may be required. In 

particular, the variance explained by the marker 

2 

decreases by a factor of (1-20) and the number of 

2 

05 progeny consequently increases by a factor of 1/(1-20) . 
For an RFLP map with markers every 10, 20, 30 or 40 cM 
throughout the genome, the progeny size would need to be 
increased by 22%, 49%, 82% or 123%, respectively, to 
account for the possibility that the QTL might lie in the 

10 middle of an interval (i.e., at the maximum distance from 
te nearest RFLP) . 

(iii) The approach does not define the likely 
position of the QTL. In particular, it cannot 
distinguish between tight linkage to a QTL with small 

15 effect and loose linkage to a QTL with large effect. 

(iv) The suggested false positive rate of a - 0.05 
neglects the fact that many markers are being tested. 
While the chance of a false positive at any given marker 
is only 5% the chance that a false positive will occur 

20 somewhere in the genome is much higher. 

These difficulties stem from the fact that single 
markers are analyzed one - at- a - time . To remedy these 
problems, the approach is generalized, as described in 
the following section, to make it possible to exploit the 

25 full power of an RFLP linkage map to scan the intervals 
between markers as well. 

Q TL Ma pp ing: Interval Mappin g using LOP Scores 

Method of maximum _ likelihood: The traditional 
approach, involving linear regression of phenotype on 
30 genotype, is a special case of the method of maximum 

likelihood. Formally, the phenotype (f> and genotype g^ 
are assumed to be related by the equation 



WO 90/04651 



PCT/US89/04688 



-39- 



4> i - a + bg i + £ , 

where g. is encoded as a (0, 1) -indicator variable, e is a 

2 

random normal variable with mean 0 and variance a , and 
2 

a, b, and a are unknown parameters. Here, b denotes the 

05 estimated phenotypic effect of a putative QTL. 

2 

The linear regression solutions (a*,b*,a *) are, in 
fact, maximum likelihood estimates (MLEs) for the 



parameters -- that is, they are the values which maximize 
the probability L(a,b 
10 have occurred. Here, 



15 



2 

the probability L ( a , b , a ) that the observed data would 



L(a,b,a 2 ) = H. z((^.- (a+b gi ) ) , a 2 ) , (4) 

where z(x,ct 2 ) '- (2na 2 )' H exp ( -x 2 /2a 2 ) is the probability 

density for the normal distribution with mean 0 and 
2 

variance a . The MLEs are compared to the constrained 

MLEs obtained under the assumption that b — 0, 

corresponding to the assumption that no QTL is linked. 

These constrained MLEs are easily seen to be 
2 

(^ A> 0. a BC1 ). The evidence for a QTL is then 
summarized by the LOD score: 

20 LOD = log 1Q (L(a*. b*. a 2 *)/L(M A .0,<7 2 BC1 )) , 

essentially indicating how much more likely the data is 
to have arisen assuming the presence of a QTL than 
assuming its absence. (The choice of log 1Q accords with 
longstanding practice in human genetics, although log 
25 would be slightly more convenient below). Morton, N.E., 
Am^__Ji_.HunK_G ene , 7:277-318 ( 1955). If the LOD score 
exceeds a predetermined threshold T, a QTL is declared to 
be present. The important issues are: (i) What LOD 



WO 90/04651 PCI7US89/04688 



•40 



threshold T should be used, in order to maintain an 
acceptable low rate of false positives? (ii) What is the 
expected contribution to the LOD score (called the ELOD) 
from each additional progeny? The number of progeny 
05 required is then T/ELOD, to provide even odds of 

i 

detecting the QTL with the desired false positive rate. 

When only a single genetic marker is being tested, 

these equations are easily answered. (i) By a general 

result about maximum likelihood estimation in large 

10 samples, LOD is asymptotically distributed as 
2 2 

h(log^ge)x i where x denotes the chi-squared 

distribution with one degree of freedom. Kendall, M. and 

A. Stuart, The Advanced Theor y of Statistics, Vol. 2, 

Griffin: London 1979). A false positive rate of a will 

15 thus result if the LOD threshold is chosen so that T — 

2 

h(log 1Q e) (Za) . For the 5% error rate suggested by 

Soller and Brody, the threshold is T - 0.83. The 

question of the appropriate threshold when many markers 

are being tested is postponed temporarily. (ii) For a 

2 

20 QTL contributing a to the backcross variance, the 

expected LOD score per progeny (ELOD) is 

ELOD - H log 10 (l + o 1 /a 2 ) (5a) 
« *Clog 10 e) (5b) 

« 0.22 (o- fa ) (5c) 
exp' res 

25 where (5a) follows from well-known results about linear 

regression and (5b) follows from Taylor expansion for 

2 2 

small values of (a fa ) . Combining these two 

exp' res ° 

results, the number of progeny required so that the LOD 
score is expected to exceed T is 

30 T/ELOD « (Z ) 2 (a 2 /a 2 ) (6) 

' a exp' res' 



-41- 



This confirms that the maximum likelihood approach agrees 
with the result (3) from the traditional approach above, 
when examining effects at a single marker locus. The more 
general framework of maximum likelihood, however, allows 
the method to be generalized to several more complex 
situations described below. 

(iii) £££££££6^he_nuraber_of_2rogen2 to be 
JS£H°£Z2®^ • ^ n typical cases, a reduction of up to 7 -fold 
can be achieved by combining two approaches: interval 
mapping and selective genotyping. Selective genotyping 
involves growing a larger population, but genotyping only 
those individuals whose phenotypes deviate substantially 
from the mean. Additional methods for increasing the 
power of QTL' mapping include reducing environmental noise 
by progeny testing and reducing genetic noise by studying 
several genetic regions simultaneously. 

iE£££Z=ti_J2££2iH£ : If genetic markers have been 
scored, throughout the genome,, the method of maximum 
likelihood can be used as above to estimate the 
phenotypic effect and the LOD score for a putative QTL at 
any given genetic location. (Lander, E.S. and D. 
Botstein, Proc e e d i ng s _ o f _ t h e_N a t i o n a 1_A c a d e m^_ o f 
S £ i e n c e s X _U S A , 83:7353-7357 (1986); Lander, E.S. and D. 

Botstein, Co l^d_S.£r i^ng_Ha rb or_S^m£^_5uant_ : B .L o l.^ , .51:49-62 

(1986)). The main difference is that the QTL genotype 
for individual i is unknown: the appropriate likelihood 
function is therefore 

L(a,b,a 2 ) - n i [G i (0)L.(0) + G ( 1 ) L i ( 1 ) ] , (7) 

2 

where L^x) = z ( (tf - ( a+bx) ) , a ) denotes the likelihood 
function for the individual i assuming that -g.=x and 



WO 90/04651 



l 



PCI7US89/04688 



-42- 



G^(x) denotes the probability that g^-x conditional on 
the genotypes and positions of the flanking markers. 
(Given a map function, G is easily computed. For 
example, if the flanking markers both have genotype AA in 

05 an individual and they lie at recombination fraction 6 

and 9' from the putative QTL , then the probability of the 
QTL genotype being AB is 68', assuming no interference.) 
Note that (7) reduces to (4) in the special case that the 
QTL lies at a marker locus and the genotype g^ is thus 

10 known with certainty. 

* "Je 2 * 

Finding the maximum likelihood solution (a ,b ,a ) 
to (7) can be regarded as a linear regression problem 
with missing data: none of the independent variables 
(genotypes) are known; only probability distributions for 

15 each are available. Although standard computer programs 
for linear regressions cannot be used, techniques for 
maximum likelihood estimation with missing data have been 
developed in recent years. Little, R.J. A. and D.B. 
Rub i n , Statis tical Analysis with Missing Data , Wiley, NY 

20 (1987) . By adapting the EM algorithm, a computer program 
MAPMAKER-QTL has been written to compute LOD scores for 
putative QTLs . Dempster, A. P. et al., J_ L _Roy__ L _S^ ta t is 
Soc . , 3_9 :1_38 (1977); Lander, E.S. and P. Green, 
Proceedings of th e Nat ional Academy of Sciences^ USA , 

25 84:2363-2367 (1987). 

To illustrate the method, simulated data from many 
backcrosses has been analyzed. Figure 5 presents a QTL 
likelihood map, showing how the LOD score varies 
throughout a genome, for a simulated data set involving 

30 250 backcross progeny segregating for five QTLs with 
various allelic effects. Based on the assumed genome 
size and density of markers, a LOD score of 2.4 is 
required (see below) for declaring the presence of a QTL. 



WO 90/04651 



PCT/US89/04688 



-43- 



the fifth does not attain statistical s i^f 

lsacai significance The 

™-«. p..i«i« of the QILs ls indicatsd 

confidence intervals, defl „ ed fcy ^ ^ 
I..«x. map at which th . likelihood raclo 

factor of 10 fro. th . aaxl _ ; such confidence > « 

£SSS14=_iiBkae., Johns Hopkins University B.i,- 

10 (1985)). Baltimore 

Among th. advantages of th. approach are- 
St 'th r e H QIL litellhh ° d "P »P»..««. olearlv the 
genol """" entir. 

inferred' h" """"" " '"""-"l .PP»..h, the 

n erred ph.notypic effects are asymptotically unbl , sed . 

" ™' 41 «* •«■-««. Of the fact that they 
are aLE s for a correctly specified model. (K.ndell „ 

20 I ^^- a -^X-Of.Statis_tio S , vol.' 2 

Griffin: London (1979) ) . 

confide" 1 ' ^ P " bable P ° Sltl0n ° £ thS ls °X 

confidence intervals, indicating the range of points for 

100, lf d..ir.d) of the maximum. 

the tril Inte r al " iPPi " S " qUir " f6 "" 

tradition.! approach for the detection of OTLs In 

I'"" ln " MCh tt * ^-"^ -Cars do not r.comhine 

the chance of a douhl. crossover ( e. g .. at most U in the 
of a 20 c M RFLP „„. In essence - 

Virtual marker in such mei.os.es. Supposing that genetic 
-rkers are avail.hl. every d CM „ 4 considering the 



25 



WO 90/04651 . PCT/US89/04688 



.44. 



(worst) case of a QTL in the middle of an interval, one 
can show (Appendix [A2]) that 



2, 

interval mapping 



ELOD . , ________ = <l-2*)*ELOD/(l-tf), (8a) 



where i> is the recombination fraction corresponding to d 
05 cM, 8 is the recombination fraction corresponding to hd 
cM, and ELOD is the expected LOD score for a marker 
located exactly at the QTL. By contrast, recall that 

ELOD . , , (1-20) 2 ELOD. (8b) 

single markers 

t 

Interval mapping thus decreases the required number of 

10 progeny by a factor of (1-V>) - -which is exactly the 

proportion of meioses in which the flanking markers do 
not recombine. For maps with d - 10,20,30 and 40 cM , the 
savings are 9%, 16%, 23% and 28%, respectively. 

(v) QTL likelihood maps can also be used to 

15 recognize a pair of linked QTLs, provided that they are 
not so close that recombination between them is very 
rare. Holding fixed the position of one QTL, the 
increase in LOD score caused by a second putative QTL can 
be computed for each position along the chromosome. An 

20 example is shown in Figure 6. 

In addition to being tested on numerous simulated 
data sets, interval mapping has recently been applied to 
an interspecific, b'ackcross in tomato: six QTLs affecting 
tomato fruit weight, four QTLs affecting the 

25 concentration of soluble solids, and five QTLs affecting 
fruit pH were mapped to about 20-30 cM. (Paterson, A.H. 
et al . , Submitted (1988)). In general, interval mapping 
should prove valuable for analyzing and presenting 



PCT/US89/04688 



-45- 



05 



10 



evidence for QTLs , and for decreasing somewhat the number 

of progeny required to detect QTLs of a given magnitude. 

^£EI£££iste_thre£hold_f £ r_L0D_s c o r e s : When an 

entire genome is tested for presence of QTLs, the usual 

nominal significance level of 5% corresponding to a LOD 

score of 0.83 is clearly inadequate. Indeed, applying 

this standard would have resulted in a spurious QTLs 

being declared on chromosome 10 in Figure 5. The 

appropriate threshold depends on the size of the genome 

and the density of markers genotyped. 

To determine the correct LOD threshold, it is useful 

to consider two limiting situations: (i) the sparse-map 

case, in which consecutive markers are well - separated and 

(ii) the dense-map case, in which the spacing between 

15" consecutive markers approaches zero. In each case, the 

issue is: If no QTLs are segregating, what is the chance 

that the LOD score will exceed the threshold T somewhere 

in the genome? 

In the sparse-map case, occurrences of spuriously 

20 high LOD scores are essentially independent. To achieve 

an overall significance level of a when M intervals are 

tested, a nominal significance level of a/M should be 

required for each individual test, corresponding to a LOD 

threshold of H ( log. _e ) ( Z /M) 2 . 

x u a 

25 In the dense-map case, occurrences of spuriously 

high LOD scores at nearby markers are no longer 
independent events. Even if markers were typed 
continuously throughout the genome, there would be a 
maximum statistical penalty to be paid. In fact, as 

30 shown in the Appendix [A3j, in the limit jof an infinitely 
dense-map and a large progeny size, the LOD score varies 
according to the square of an Orens tein- Uhlenbeck 
diffusion process. -Well-known in physics and 



WO 90/04651 - PCT/US89/04688 



-46- 



engineering, the Orens tein-Uhlenbeck diffusion describes 
a particle executing Brownian motion while being coupled 
to the origin by a weak spring. The extreme value 
properties of this diffusion have been extensively 

05 studied and the results immediately translate into 

statements about how high a LOD score will be expected by 
chance, given the size of the genome. (Leadbetter, M.R. 
et al . , Extremes and related pro pe rt ies_o f _random 
sequen ces an d pr ocesses, Springer, NY (1983)). 

10 Specifically, for a high threshold T, there is (see 

Appendix [A3]) the following result: 

Proposition 2 . Consider an organism with C 

chromosomes and genetic length G, measured in Morgans. 

When no QTLs are present, the probability that the LOD 

2 

15 score exceeds a high le'vel 1 is = (C + 2Gt) x (t) , where 

t - (4.6)T. In order to make the probability less than a 

that a false positive occurs somewhere in the genome, the 

appropriate LOD threshold is thus = To - 4.6ta, where t Q 

2 

solves the equation a - (C + 2Gt Q ) x (t Q ) . 

20 For both the sparse-map and dense-map cases, a 

standard chi-square table may be used to calculate the 
LOD score threshold corresponding to a 5% chance that 
even a single false positive will occur. For 
intermediate situations, extensive numerical simulation 

25 was used to determine the appropriate LOD thresholds as a 
function of genome size and marker spacing (Figure 7). 
Typically, a LOD score of between 2 and 3 is required to 
ensure an overall false positive rate of 5%. For 
instance, analyzing the domestic tomato (C — 12, G = 11) 

30 with a 20 cM RFLP map requires a LOD threshold of 2.4-- 
equivalent to applying a nominal significance level of 
about a' - 0.001 for each individual test performed. If 
the nominal 5% significance level (LOD > 0.83) were used 



WO 90/04651 



PCI7US89/04688 



-47- 



05 



instead, the probability would exceed 90% that a false 
positive would arise somewhere in the genome. Indeed, a 
LOD score of 1.5 occurred by chance on chromosome 10 in 
the simulated data shown in Figure 5. 

Numb e_r_o f _£r oge ny_r e£u j^r e d : Given the ELOD for a 
QTL as a function of its phenotypic effect (Equation (8)) 
and the LOD threshold T (Figure 7) a progeny size of 
T/ELOD will ensure a 50% chance of detecting linkage to 
such a QTL no matter where it lies in the genome. If it 
10 is desired to increase the chance of success to 1000%, 

standard arguments show that the progeny size should be 

2 

further increased by a factor of [1 + (Z. a /Z ,)] , where 

1 - p a 

a' is the nominal significance level corresponding to a 
LOD score of-T. (Kendall, M . and A. Stuart, The Advanced 

15 lh±°!T-°l _£££ 1 i s t _ic s , Vol. 2, Griffin: London (1979)). 

A technical note: The approximate progeny sizes 
given above (Equations 3, 5ab; 6, 8ab) are exact in the 
case of QTLs with small effects. Slight modifications 
are required for QTLs with large effects; see Appendix 

20 [A4]. 

lB££13^i_ng_the_Power_£f_2TL_Ma2£i.ng 

Although interval mapping increases the efficiency 
of QTL mapping somewhat, large numbers of progeny may ' 
still be required. Additional methods are available to 
25 increase the power of QTL mapping, the most important of 
which is selective genotyping. 

£®l££liZ£_5£ B££Y£iBS_£l_ih e ._£2i££ ±™ ^BZOfilHY : Some 
progeny contribute mo.re linkage information than others. 
As a general principle, the individuals that provide the 
30 most linkage information are those whose genotype can be 
most clearly inferred from their phenotype. For example, 
Lander and Botstein have pointed out that the vast 



-48- 



majority of linkage information about human diseases with 

incomplete penetrance comes from the affected 

individuals; since the genotype of unaffected individuals 

is uncertain, they provide relatively little information. 

Lander, E.S. and D. Botstein, Cold Spring Harbor Symp, 

Quant. Biol. , 51:49-62 (19S6). 

Applying this principle to quantitative genetics, 

the highest ELODs are provided by the progeny that 

deviate most from the phenotypic mean. When the cost of 

growing progeny is less that the cost of complete RFLP 

genotyping (as is frequently the case) , it will thus be 

more efficient to increase the number of progeny grown 

but to genotype only those with the most extreme 

phenotypes. This increase in efficiency can be estimated 

as. follows, with a more precise argument given in the 

Appendix [A5]. Since regression minimizes squared 

deviations from the mean, the ELOD conditional on an 

2 

individual's phenotype <j> is proportional to . 
Thus, the proportion of individuals with extreme 
phenotype <f> such that |^-^ B ^| > l £ s 



Q(L)-2/ L " z(x)dx, 

while the proportion of the total linkage information 
contributed by such individuals is 

S(L) - 2/ L " X 2 z(x)dx - Q(L) [1 + 2Lz(L)/Q(L)] 

= Q(L) [1 + L 2 ] (9) 



using integration by parts and the approximation 
z(L)/Q(L) = for large L. Accordingly, the same total 
linkage information would be obtained by growing a 



WO 90/04651 



I 



PCT/US89/04688 



-49 - 



population that was larger by a factor of h(L) = 1/S(L), 
but only genotyping individuals with extreme phenotypes. 
The number of progeny to genotype would fall by a factor 
of g(L) - S (L)/Q(L)=[ 1+L 2 ] . Graphs of Q (L) , S(L), h(L) 
05 aT1( i g(L) are shown in Figure 8. Results show that: 

(i) Progeny with phenotypes more than 1 standard 
deviations from the mean comprise about 33% of the total 
population but contribute about 81% of the total linkage 
information. By growing a population that was only about 

10 25% larger and genotyping only these extreme progeny, the 
same total linkage information would be obtained from 
genotyping only about 40% as many individuals. 

(ii) Progeny with phenotypes more than 2 standard 
deviations from the mean comprise about 5% of the total 

15 population but contribute about 28% of the total linkage 
information. By growing a population that was about 
3.6-fold larger and genotyping only these extreme 
progeny, the same total linkage information would be 
obtained from genotyping about 5.5-fold fewer individuals 

20 (since h(2) - 3.6 and g(2) =- 5.5). 

(iii) It is probably unwise to go beyond the 5% 
tails of distribution. From a practical point of view, 
true phenotypic outliers may ' represent artifacts. 
Moreover, the increase in population size required for 

25 L > 2 outweighs the decreased number of individuals to 
genotype . 

The strategy of selective genotyping will 
substantially increase efficiency whenever growing and 
phenotyping additional progeny requires less effort than 
30 completely genotyping individuals at all RFLP markers, 
which is typically the case in many organisms. 

It should be noted that standard computer programs 
for linear regression cannot be used (even for single 

! 



-50- 



marker analysis) when only Che extreme progeny have been 
genotyped; phenotypic effects would be grossly 
overestimated because of the biased selection of progeny. 
As in the case of interval mapping, mis s ing - data methods 
are required. (Little, R.J. A. and D.B. Rubin, 
Stati stical Analysis with Missing_Da ta , Wiley, NY 
(1987)). Conveniently, the maximum likelihood methods 
discussed above will produce the correct results , 
provided that the phenotypes are recorded for all 
progeny: genotypes for the non-extreme progeny may 
simply be entered as missing. Using the MAPMAKER-QTL 
program, the method has been applied to both simulated 
and experimental data sets. 

Decreasing environmental variance via progeny 
tes t ing : As shown above, the nuiaber of progeny needed to 
map a QTL is proportional to 

(a 2 /a 2 ) - [(<t 2 „ + a 2 _)/er 2 ] - 1. 
K res 7 exp y L G E y/ exp 

Typically, the environmental variance exceeds the genetic 
2 

variance. If r could be reduced, QTL mapping would 

become considerably more efficient. If the environmental 

noise results from measurement error, one might either 

average replicate measurements or try to develop a better 

assay. More often, environmental noise results from 

actual physiological differences between genetically 

2 

identical individuals. In this case, a _ can be reduced 

Ei 

through progeny testing; an individual's phenotype would 
be taken to be the average phenotype of n of its B2 

backcross offspring. The variance of this average is 

2 2 2 2 

c g'- (l/n)[hff g + a g], which will be less than a , 

except for very small n. 



-51- 



lil2Hi££Il£°H£_££S££ll : Just as environmental noise 
can be decreased via progeny testing, genetic noise can 
be reduced by simultaneously studying several intervals 
containing QTLs . If the genetic variance is large, such 
an approach may further decrease the number of progeny 
required. The extension of interval mapping to such 
simultaneous search; the question of the appropriate LOD 
score when considering sets of intervals, and the 
approximate increase in the power of QTL mapping are 
discussed in the Appendix [A6]. 

Although the discussion above concerns the backcross, it 

applies directly to' F2 intercrosses and recombinant 

inbred strains, with the following modifications: 

' (i) 'In an F2 intercross,' a QTL with phenotypic 

2 

effect 5 contributes variance S /8 and thus Wright's 

2 2 

formula (2) becomes k = D /8a . Since F2 intercrosses 
provide information about twice as many meioses as 
backcrosses of the same size, fewer progeny are required 
for detecting QTLs having purely additive effects: only 
50-60% as many progeny are needed, depending on the 
density of the markers used. If a QTL is partly 
dominant, one of the backcrosses will be more efficient 
and one less efficient for mapping it. The magnitude of 
dominance effects can be estimated by explicitly 
incorporating them into the maximum likelihood analysis 
via an additional parameter. 

(ii) Recombinant inbred strains are analyzed in the 
same manner as backcrosses, except that the 
multi - generational breeding scheme that is used to 
construct recombinant inbred strains increases the 
effective genetic length of the genome. Compared to a 
backcross, the density of crossovers is doubled in a 



WO 90/04651 



PCT/US89/04688 



-52- 



recombinant inbred produced through selfing and is 
quadrupled in a recombinant inbred produced by sib 
mating. Haldane, J.B.S. and C.H. Waddington, G enet ics , 
16^:357-374 (1931). A genetic length of 2G or 4G must be 
05 used in place when computing the appropriate LOD 

threshold, which leads to an increase of about 0.3 or 
0.6, respectively, in the threshold required. Although 
the higher threshold will increase the number of progeny 
required, the effect is typically offset by the ability 
10 to decrease the number of progeny by reducing the 

environmental variance through replicate phenotypic 
measurements within each recombinant inbred strain (of. 
progeny testing above) . Recombinant inbred strains will 
thus typically be more efficient for QTL mapping than 
15 equal number of backcross progeny. However', this 

advantage may often be negated by the considerable time 
and effort required to construct large numbers of such 
strains. However, the ability to replicate phenotype 
measurements within each recombinant inbred strain leads 
20 to a reduction in the environmental variance (cf. progeny 
testing above). Typically, the latter consideration 
dominates. A drawback to employing recombinant inbred 
strains is the considerable time and effort required for 
their construction. 
25 Although the Results section is mathematical in 

parts, the Discussion presents the methodology in terms 
of explicit graphs that allow a geneticist to design 
crosses to dissect a quantitative trait by using a 
complete RFLP linkage map. 



3 0 Equivalents 

Those skilled in the art will recognize, or be able 
to ascertain using no more than routine experimentation, 



WO 90/04651 



i 

PCT/US89/04688 



-53- 



many equivalents to the specific embodiments of the 
invention described specifically herein. Such 
equivalents are intended to be encompassed in the scope 
of the following claims. 



WO 90/04651 



PCT/US89/04688 



-54- 

APPENDIX 

[A I] To prove Proposition 1, we use the following lemma. 

Lemma. Let x v ...,x n > 0. For v > 0, let s y = £'.r. and t y = £'.v.=, where the sum is 
taken over the terms x > y. If / /j > y, then 

^ > V> a -40'f 0 -/ 0 )] and ; y > t Q -y(s 0 -s y ). 

Proof: From the definitions and the non-negativity of the x., it is clear that 

s/>t y >t 0 -y(s 0 -s y ). 

The constraint on s then follows by considering the outer terms and applying the 
quadratic formula, a 

In the context of Proposition I, suppose that the QTLs in the high strain 
change the phenotype by x v ...,x n > 0, respectively. Using the notation above, we 
have D = s Q and a 2 G = r Q /16 = D 2 /k (because of non-linkage among QTLs and 
Wright's formula). Taking y = e(D/k), the result then follows from the lemma 
since D. = Jy /i 0 ,andV £ = y V 

[A2] Suppose that a QTL lies midway between two flanking markers. Let 9 be the 
recombination fraction between the QTL and either marker and 0 = 20(1-0) the 
recombination fraction between the two markers (ignoring interference). In 
meioses in which they have not recombined (a proportion 1-0 of- the total), the 
flanking markers act as a single virtual marker linked at recombination fraction 7, 
where 7 is the chance that the QTL recombines with both markers given that the 
markers themselves have not recombined. By contrast, meioses in which the 
flanking markers have recombined provide zero information about linkage of the 
QTL. The ELOD for interval mapping is thus (1-0) times the ELOD for :i single 
marker linked at 7 which in turn is (l-^/) 1 times the ELOD for a marker at 0r o 
recombination. That is, 

ELOD., , = (i-tMn.^p ELOD 

interval mapping v * A' -u t-u\ju. 

Using the relation 7 = 1 -0)-+i-\ and simplifying terms. Equation Sa follows. 



WO 90/04651 PCT/US89/04688 . 



-55- 

[A3] In the idealized dense-map case, suppose that markers are available at every 
point along a chromosome. Suppose that there are no QTLs in the genome. For 
individual /, the phenotype 6. = N(0,1); that is 6. is a random normal variable with 
mean 0 and variance 1. For individual /, let x.(d) denote the genotype at a position d 
05 cM from the Ie ft end of the chromosome (.r. = 0 or 1 according to the allele 

inherited), let 0'(d) denote the maximum likelihood estimate of the phenotypic 
effect of a putative QTL at this position, and let LOD(d) denote the corresponding 
LOD score. By standard formulas for linear regression. 



10 



20 



25 



--ZW.-«(.-c.-.r)/(.-c.-.v)* 

where 4> and x are the means of 4>. and x, respectively. For a large population of size 
n, the central limit theorem implies that 



fi'(d) - £ 4^.(.r.-f), 
v{d) ■- Jnfl'(d) - ,V(0,I), 
^re,^ = T.te i -P*Wx-<iet)) 1 ~ »{l-S'(d))\ 
15 a %x P (£/) ~ "WW and 

LOD{d) - \{los l0 ^\ xp {d)/a^Jd)) 
~ \(lo- 10 e)WnH-{d)Y. 

Thus, LOD(d) is asymptotically proportional to the square of a random normal 
variable v(d) (which incidentally proves that LOD is proportional to x z ). 

Let r[a,b) denote the correlation coefficient for any two random variables a 
and b. Let d y and J, denote points on the chromosome and write J = d -,/„. From 
the asymptotic expression for /3-(d) above, it follows that 

r{d) := v{v{d x )s{J z )) = £ y/n^.(\-20) ~ 1-20, 

where 6 is the recombination fraction corrdsponding the the genetic distance </ = 
\d x -d 2 \. Assuming llaldane's map function. ' 1 -20 = c'- d . 

To summarize. v(d) is a stationary normal process with covarinnce function 
r(d) = e" 2d . Up to rcscniing d by a factor of.}, this is (he definition of Ohenstein- 



WO 90/04651 



PCT/US89/04688 



-56- 

Uhlenbeck diffusion and Proposition 2 follows directly (see Leaderdetter, Lindcren 
and Rootszen 1983. Theorem 12.2.9 and discussion following). 

While only Haidane's map function yields precisely an Oren'STEI.v-Uiile.vbeck 
diffusion, the proof of Proposition 2 holds in general. The relevant results in 
05 Leaderbetter. LlN'DGREN and Rootszen' ( I9S31 require only that r{d) ~ \-2d+o{d-) as 

d— .0, which holds for all map functions. 

[A4] If QTLs with very large effects are segregating, regression analysis is not strictly 
appropriate (whether in the traditional approach or in the generalization developed 
in the text) because the phenotypic distribution eventually becomes bimodal. In the" 

10 extreme, the assumption that the phenotypes are generated by the segregation of a 

QTL will always fit the data much better than the assumption that they follow a 
normal distribution — even for loci unlinked to any QTLs. Consequently, expressions 
(3) and (6) fail to give the correct number of progeny required in the case of QTLs 
with very large effects: indeed, they tend to zero (whereas a minimum positive 

15 number of progeny is obviously needed no matter how large the effect of the QTL). 

To achieve the correct limiting behaviour, the LOD score for a marker at 0 cM can 
be redefined as the /oj?,q of 

L(a.6,a 2 )/|[L(c,6.a2)+Z.(a,-6,^)] 

with L{a,b,a z ) defined in (4). This ratio measures how much more likely the data is 
10 to have been generated by a QTL with the hypothesized effect located at the marker 

locus than by a QTL with this same effect but unlinked to the marker. The ELOD 
can be found by numerical integration over the distribution for <j>. In the limit of a 
QTL with large effect, the expression tends to the traditional LOD score for a 
qualitative trait used in human genetics. 
5 For the QTLs likely to be encountered in practice, this correction is 

irrelevant. We have used it in computing the number of progeny required in Figures 
5 and 6, however, in order that these graphs exhibit the correct limiting behaviour— 
rather than tending to zero. 

[A5] For notational convenience, rescale the phenotype so that its mean in the 
0 backcross is 0 and and encode the two alternative genotypes by the indicator 

variable g = -1 or 1 (rather than 0 or I. as in the text). Given a true QTL, let 2h be 
the amount by which substituting an allele increases the phenotype and let a* lie the 
residual variance unexplained by the QTL out of the total baekcross variance L' : = 
o z + h z . Suppose that a marker is located exactly at the QTL. Conditional on the 
5 phenotype 0 of an individual but unconditional on its genotype x nt the marker, the 



WO 90/04651 



) 



) 

PCT/US89/04688 



-57- 

LOD score (comparing the true hypothesis H^O.b.a 2 ) to the alternative //^(O.O.E 2 )) 



is 



L OV} = I g=oa log 1Q [:(.<?-b«,oi)/:(t.T:-)] 

where z(s=.x\i) is the probability that the individual has marker genotype x given its 
05 phenotype <j>, given by 

:(<i>-b?,o 2 )/[:($-h<:,o'-)+z(<?+b5,cr 2 )]. 

As claimed in the text, if b is small, LOD^ is proportional to Now, the 
probability distribution for <f> has density 

10 Conditional on the phenotype of a backcross progeny deviating from the mean by > 

LL, the LOD score is 

• L0D !;|>[rWE rL0D ^"(?^. 

Letting v = b 2 /Tr denote the fraction of variance explained by the QTL, 
. straightforward although tedious integration shows that 

15 S(L) = L0D MiL2 / LOD, ^ 

« Q{L) [1 + 2uLz(L)/Q(D], (| 0 ) 

where u = -v/log e (\-v) « (L-Jv) and where the approximation in (10) is w(v^) for 
small v. For QTLs with small effects, this reduces to Equation (9). 

[A6] Interval mapping can be straightforwardly extended to the case of multiple 
intervals explaining a quantitative phenotype: for m intervals, the bracketed term in 

20 Equation 7 becomes a sum with 2 m terms corresponding t0 the possible joint 

genotypes at the m putative QTLs. Since simultaneous consideration of multiple 
QTLs reduces the unexplained variance, it may be somewhat 'easier to detect 
linkage to the set of loci than to any one individually (of. Landkr and 1)otsti:in 
19S6ab). The subtle issue is the appropriate threshold for simultaneous search for 

25 hi QTLs. In a genome with no QTLs. how high a LOD score might occur by 

chance? For any particular choice of putative QTLs. the LOD score is 



WO 90/04651 



PCT/US89/04688 



-58- 

asymptotically distributed as x 1 w ith m degrees of freedom. When considering an 
entire §enome, the L.OD score follows a mathematical process known as a x 2 
random field (ADLER 198 D— about which somewhat less is known than the 
Orenstein-Uhlexbeck diffusion. Approximate arguments show that the level of the 

05 highest excursion of such a x 1 random field on an entire genome is about w-foid 

higher than the corresponding level for an Oren'STEI.v-Uiilenbeck diffusion on the 
genome. If m QTLs have equal effects, then simultaneous search decreases the 
number of progeny required to achieve statistical significance by a factor of about 
(l-ma 2 )/(l-cr a ), where a 2 is the fraction of variance explained by each. If the QTLs 

10 have unequal effects, it may become possible to detect those with smaller effects by' 

first controlling for those with larger effects. We will discuss simultaneous search 
for QTLs in more detail elsewhere. 



WO 90/04651 



I 



i 

PCT/US89/04688 



-59- 



10 



15 



CLAIMS 

1. A method of mapping genomic regions containing 

polygenic factors controlling quantitative trait 
loci comprising the steps of: 
05 a. crossing two strains of interest, the strains 

differing as to at least one trait of interest, 
to produce progeny; 

b. carrying out one or more crosses, which are 
either back-crosses or intercrosses, to produce 
progeny ; 

c. scoring progeny of the one or more crosses for 
at least one trait of interest and for selected 
genetic markers, the genetic markers comprising 
a genetic linkage map; and 

d. applying an algorithm designed to maximize the 
likelihood of the trait of interest to the- 
results of scoring progeny as in step (c). 

2. A method of mapping quantitative trait loci in the 
genome of a higher plant, comprising the steps of: 

a. crossing two interfertile strains of the higher 
plant, the two strains differing as to at least 
one trait of interest, to produce progeny; 

b. carrying out one or more, back-crosses of one of 
the two interfertile strains and progeny 
produced in step (a), to produce progeny; 

c. scoring progeny of. the back-crosses for at 
least one trait of interest and for selected 
genetic markers, the genetic markers comprising 
a genetic linkage map; and 

d. applying an algorithm designed to maximize the 



20 



25 



30 



-60- 



likelihood of the trait of interest to the 
result of scoring progeny as in step (c). 

A method of genetic dissection of a quantitative 
trait, comprising the steps of: making an 
appropriate interspecies back-cross which will 
detect a specific minimum phenotypic effect and 
determining the presence or absence of the minimum 
phenotypic effect. 

A method of mapping genomic regions containing 
polygenic factors controlling at least one 
quantitatively inherited trait of interest, 
comprising the steps of: 

a. - making a back-cross of two species which" differ 

in at least one quantitatively inherited trait 
of interest, to produce back-cross progeny; 

b. scoring back-cross progeny as to the occurrence 
of selected restriction fragment length 
polymorphisms ; 

c. constructing a genetic linkage map using the 
information resulting from step (b); and 

d. constructing an interval map of quantitative 
trait loci. 

A method of designing a cross for genetic dissection 

of a quantitative trait, comprising the steps of: 

a. selecting two strains differing for a 

quantitative trait of interest, the two strains 

having a phenotypic difference D which is large 

2 

compared to an environmental variance a ^ and a 
backcross between the two strains having a 



WO 90/04651 



PCT/US89/04688 



-61- 



b . 



relatively small number of K of QTLs 
segregating therein; 

specifying a minimum phenotypic effect S that 



the cross of the two strains will be designed 
to detect the specific minimum phenotypic 
effect S ensuring that QTLs accounting for a 
substantial amount of the phenotypic difference 
D will be detected; and 
c. genotyping a number N of the backcross progeny 
of the two strains where N is calculated from a 
function of spacing between genetic markers in 
a map (of ?), threshold T for an LOD score (of 
?), and a desired probability 0 that a false 
positive occurs; the number N decreasing for 
increasing fraction o*f genetic variances V in 
the backcross explained by the QTLs that 
account for a substantial amount of the 
phenotypic difference D, and the number N 
decreasing for smaller K with larger D. 

6. A method of Claim 5 wherein within progeny of a 

2 

backcross of one of the strains a variance of 

D 1 

the phenotype is defined by 



2 



2 



2 



a 



Bl 



a 



G 



4- a 



E 



where a is genetic variance. 



25 



7 . 



A method of .Claim 6 wherein K is defined by 



K 




WO 90/04651 



PCT/US89/04688 



-62- 



8. A method of Claim 5 wherein Che seep of specifying a 
minimum phenotypic effect $ includes choosing a 
value of S which is from approximately H(D/K) to 
approximately (D/K) . 

05 9 . A method of Claim 5 wherein the genetic variance V 
explained by the QTL is defined 

V - _il 

16a fil , 

where a is the variance of the phenotype in 

B 1 

progeny of a backcross of one of the selected 
10 strains . 

10. A method of Claim 5, further comprising the step of: 
mapping QTLs when the phenotypic difference D 
measured in environmental standard deviations is on 
the order of the number K. 



WO 90/04651 



PCT/US89/04688 



>> 
o 
c 

CD 
D 
CT 



60 



50 



40 



30 



il 20f- 

10 
0 



Mean (S.D.) 
E 65.8(10.3) 
BC 17.5(7.6) 



i" i ' i *i 'i 'i ' i ' i 'i 



Ll 



T— 1 — r"t 



JU 



0 8 16 24 32 40 48 56 64 72 80 88 96 
Fruit mass (grams) 



u 
c 

3 



40 



30 



20 



10 



0 



I 



Mean (S.D.) 

E 4.78(0.53) 
BC 7.60(1.76) 



i" i" i" i i i" i" i 



B.H.nn 0, 



n , , n 



i i i" i — r- 



4 5 6 7 8 9 10 1112 13 14 15 
Soluble solids (°Brix) 



3z!j. (A, 



40 



o 
c 
<v 

cr 



30- 



20- 



10- 



Mean (S.D.) 

E 4.27(0.06) 
BC 4.33(0.18) 



3.8 Y 4.0 Y 4.2 T 4.4 T 4.6 T 4.8 T 5.0 
3.9 4.1 4.3 4.5 4.7 49 

pH 



WO 90/04651 



PCT/US89/04688 



o 
c 

0) 



35 
30 
25 
20 
15 
10 
5 
0 



Min = 
58.9% 



i 



Mean =75.0% 



Max = 
90.3% 



50 55 60 65 70 75 80 85 90 
Percent recurrent parent genome 



95 100 




TG24 CD15 



3\ 20 /4| 
CD24 TG59 T671 



TG21 



18 |. 16 ) 16 I 
TG19 TG17 TG27 



i-5.4g 



SUBSTITUTE SHEET 



PCT/US89/04688 



3//S 




I 10 I 17 I 23 |7 
R45S TG1B CAB1 CD35 



12 



RBCS3 



CD66 



Chromosome 3 



—V 



\ 



8 
7 
6 

5 
4 
3 

2 
1 



56 



18 1 18 I 7 I 
CD13A TG42 CD4A CD71 



H0.94°Brix 



[H-0.12pH 



SUBSTITUTE SHEET 



WO 90/04651 ) , PCT/US89/04688 




r 25 i 14 i 6 i 35 nn 

CD59 TP12 TG2 CD55 CD39 TG22 



I 1 ~ +0.91 °Brix 

%. 3D ^==h -»-'« 




CD41 



I 11 17 1. 
TG23 CD78 CAB6 



SUBSTITUTE SHEET 



WO 90/04651 



PCT/US89/04688 



S//S 




CD67 



15 I 19 | 15 I 15 I 
SOD3 TG54 CD42 SP 



20 



PC5 

l H "4 —1 + 1 RQ °Rri* 

I— EEH 



=HH+0.17pH 



-4.6g 



SUBSTITUTE SHEET 



WO 90/04651 



) 



I PCIYUS89/04688 



6//S 




\ 19 I 18 \ 25 /6l 8 I 
CD61 TG23 60T2 TG61 TG113 TGI 3 A 



IH +0.83° Brix 

3+0.096pH 

3-3.7g 



8 
7 
6 
5 
4 
3 
2 
1 

0 






\ 




\ 




\ " 




\ - 





TG45 



TG16 



CAB2 CD46 



0 



35 



CD29A 
•0.14pH 



SUBSTITUTE SHEET - 



WO 90/04651 



) 



PCT/US89/04688 





u 



11 I 12 I 9 
CD56 CD38A 



13 



29 



CD34B 



CD34A 



HE 



TG63 CD32B 
•0.15pH 



SUBSTITUTE SHEET 



WO90/04651 



PCT/US89/04688 




SUBSTITUTE SHEET 



WO 90/04651 



i 



PCI7US89/04688 




SUBSTITUTE SHEET 



WO 90/04651 



f PCT/US89/04688 



/0//S 




k: 



3H1.21 



"X r 




4 0.91 



















1 


▲ 1 


1 "I 1 


1 



-10.72 




WO 90/04651 



PCI7US89/04688 



H//S 







3 
2 
1 

0 





Chromosome 10 






i — r i i i 




Chromosome 11 






WO 90/04651 



) 



PCT/US89/04688 



10. ■ — 




1 L— j — i — i — t — I — i — i — i i J i i i i I i i i i 

0 5 10 15 20 

Spacing between RFLPs (in CM) 

SUBSTITUTE SHEET 



WO 90/04651 



I 



PCT/US89/04688 



o 

_ rO 




SUBSTITUTE SHEET 



PCT/US89/04688 




0 1 i i i i 1 ' i ' i I ' ' i i I ' ' ' ' i ' i i i 

0.00 0.05 0.10 0.15 0.20 0.25 



Fraction of backcross variance explained 

Tfy JO 



SUBSTITUTE SHEET 



INTERNATIONAL SEARCH REPORT 



International" Application No 



I. CLASS.FfCATION OF 3USJECT MATTER jjj cl....flc»llon symbo,, .colv. lM . e .t."7l7^ 



pct/iis m/mmr 



ACC0,dlnfl 10 ""•'""'""•I P«unt Clae.ffic.tlon (IPC) or to both National CU.iific.Uon and IPC 

IPC5: C 12 Q 1/68, A 01 H 1/04 



II. FIELDS SEARCHED 



Minimum Documentation Searched ' 



Classification Symbol* 



IPC5 | C 12 Q 



Doeumtntition Searched other than Minimum Ooeumtnt.tlon 
to tht Eitent thit »uch Oocum.nta ar. Included In tht Fltldi Searched • 



III. DOCUMENTS CONSIDERED TO BE RELEVANT* 



Category • 



Cilalion of Document. " with Indicitlon, where appropriate, of the relevent paiaagaa '« 



Ralfvant to Claim No. »» 



P.X 



NATURE, Vol. 335, 1988, Andrew H. Paterson et 
al: "Resolution of quantitative traits into 
Mendel ian factors by using a complete linkage 
map of restriction fragment length 
polymorphisms ", see page 721 - page 726 



Genetics, Vol. 116, 1987, M.D. Edwards et 
al : "Mo 1 ecu 1 ar-Marker-Faci 1 itated 
Investigations of Quantitative-Trait Loci in 
Maize. I. Numbers, .Genomic Distribution and 
Types of Gene Action. ", see page 113 - 
page 125 

see especially page 114 column 2 line 14 
- page 117 column 2 line 4 



1-10 



1-10 



* Spacial cetegorlee of eltad doeumente: " 

"A" document defining tha general atata of tha art which I. not 
conndatad to ba of particular relevence 

"E" •arliar doeumant but publlahad on or altar tha intarn.tianal 
filing data 

"L" doeumant which m.y throw doubtt on priority elalm(i) or 
which ii citad to aatibliah tha publication data of another 
citation or othar ipeciai raaaon (ai apaciflad) 

"O" doeumant referring, to an oral diicloiuri, uaa, exhibition or 
othar maana 

"P" document oubliihed prior to tha intam.tion.l filing d.t. but 
later man the priority date claimed 



"T" later document publlahad after the International filing data 
or priority data and not in conflict with tha application but 
cited to underatand the principle or thaory underlying the 
invention 

"X" document of particular relevanc; tha claimed invention 
cannot be eonaidarad novel or cannot ba conaiderad to 
Involve an inventive etep 

"V" document of particular relevance; tha claimed Invention 
cannot ba eonaidarad to involve tn Inventive itep when the 
document ia combined with one or mora other auch docu- 
ment!, such combination being obvloua to 1 paraon akiilad 
In tha art. 

"*" document member of the tame patent family 



IV. CERTIFICATION 



Data of the Actual Completion of tha International Search 

4tfv February 1990 



International Searching Aulhorlty 

EUROPEAN PATENT OFFICE 



Data of Mailing of thle International Search Report 

0 7. 03. 90 



id 



Form PCT/ISAtftO Second sheet) (January l MS) 



r 



Intamationai Applleition No. p£T/(JS 89/04688 



III. DOCUMENT* CONSIDERED TO BE RELEVANT (CONTINUED FROM THE SECOND SHEET) 



Catioonr * | 



Citation of Document, with indication, wtm* approprtatt, of m* ratovant paasaoM 



Ralrwant to Claim No 



P.A 



BIOMETRICS, Vol. 42, 1986, J.I. Weller: "Maximum 
Likelihood Techniques for the Mapping and 
Analysis of Quantitative Trait Loci with the 
Aid of Genetic Markers ", see page 627 - 
page 640 

see especially page 628 line 22 - page 
633 line 19 



WO, Al, 89/07647 (PIONEER HI-BRED INTERNATIONAL, 
INC.) 24 August 1989, 
see the whole document 



1-10 



1-10 



Form PCT/1SA/210 (litre aftwt) (January IMS) 



ipttmMicn.1 Applied no. pcT/US 89/04688 

FUWTMIW INFORMATION CONTINUED FWOM THI HCOWO SMKT ' 



v-B ommvati oih WHtug ciktaih cu»m»w»m found partly unsearrhahl « 



Thta M.m.U.n.1 ...rcn r.por, ha. no, bo.n ..,.b,..r,.d ,n r..p.et of e.rta* eta,™. und.r Artie* ,7(2) (.) tar th. f.HowIn, ro.aona: 
1-U Claim numb.ra I-IU b «. u „ th . y ,„ . ub|#et „„, n ^ ^ ^ ^ ^ ^ ^ ^ 

Essentially biological processes for the production of 
Plants as well as mathematical methods, c.f. PCT rule 39.1 



^ ?- mb *I* ~ b,C " U " ,,,,y r,,,l# t0 p,rt * °' ,h « 'M«n»«lon.l application that do not comply wllh th. or..erlr»d 

m.nt. to ouch in «,.nt th., no m.anlnaful Int.m.Oon.l ...ret, on bo e.rrt^ out. vJLc*,: P'«erlbod 



roqulre- 



3Q Claim rumemm.... 
PCTPuda 6.4(a). 



thoy an d o pud a nt claim, and ant not drattad In aecordanca wttti tha 



•aoond and than aant a noaa ot 



viQ owmriONi whim unity op invintion is lacking « 



Thl. IntamaUonal Soarchln 9 Authority found muttlpf Inynuon. In thla Int.matlonal application .. follow.: 



tD ^^^^ — — - on, 



3-D No 



zirrjEE^ — „„« , 0 



< D ^"."wnfi!??!^ * < ' 0,t «'«•»•' •»• ln,.m.t..n., S-rehln, Au.Nor.ty d,d no. 

Remark on Prot.tt 
Q Th. iddiUoml ...rch f M . «.r. .ceompanl.d by apolleant'. promt. 
No proton (ceompanl.d th. paymont of addttlonal March Im.. 

Form PCT/ISA/210 (.uppl.m.nt.1 shMt (2)) (January 198S) . 



I 



ANNEX TO THE INTERNATIONAL SEARCH REPORT 

ON INTERNATIONAL PATENT APPLICATION NO. PCT/US 89/04688 



SA 32472 

This annex lists the patent fwnily members relating (o (he patent document* cited in (he ahnve-mentinnen' international search report. 

The memhers are as contained in the European I'atcnt Office F.f)P file on 08/11/89 

The Fnropean Patent Office is m no way Kahle for these particulars which are mere))- given for the piirpnsV oTinformation. 



Patent document 
cfled in search report 


ruhlieatinn 
date 


Patent family 


Publication 
date 


W0-A1- 89/07647 


24/08/89 


NONE 





s 

■ 

o 



« For more detaik about this annex : see Official Journal of the Firropran Patent Oflice, No. I2/H2 



